Purpose - One day I decided to time MATLAB's built in factorial() and gamma() functions and learned that the gamma function is ridiculously fast. So fast in fact that MATLAB gives a warning that it is inaccurate and is on the magnitude of a billionth of a second (7.9614e-08 sec). Which was around 10x faster then the factorial function (when calculating 100!), however that doesn't make sense to me how the gamma function, which is a integral, is quicker then multiplying several numbers together. This then prompted multiple attempts at recreating MATLAB's speed in calculating the factorial which ended at a function on the same magnitude as MATLAB's.
For those who don't know the relationship between the Gamma Function and the Factorial is defined below:
Because I wanted to compare each implementation to each other I measured both precision and execution time. Thus below are the methodologies followed by the code. I ran each implementation on various values of n, for the implementations that only work with positive integers it was all integers from 0 to 102 and the rest was the same but with a step size of 0.1 such that it went 0, 0.1, 0.2, 0.3, ..., 102. The reason why 102 was chosen was because 103 caused some of the implementations to output <missing> and negatives weren't tested as most outputted <missing>, Inf, or just very wrong, along with the gamma function not being defined at negative integers adding extra complexity for only two implementations functioning correctly.
Timing - Timing was fairly straightforward using MATLAB's built in timeit() function, that said I did have ambitions of timing it multiple times and taking a average but that was too slow for the desired number of points. It should also be noted that as mentioned before some implementations were so quick that timit() occasionally outputted the following warning:
Warning: The measured time for F may be inaccurate because it is running too fast. Try measuring
something that takes longer.
> In timeit (line 158)
In Factorial_Timing (line 18)
Precision - Precision was tricker then timing as none of the implementations were 100% exact hence prompting me to measure it. That said after some trial and error I settled on using the vpaintegral() function which for select values matched Wolfram Alpha's answer. The full function is shown below along with a table of select values:
syms f(x)
f(x) = x^n * exp(-x);
vpaintegral(f,0,1e3, 'RelTol', 1e-50, 'AbsTol', 0)
function ret = time()
max_test = 102;
delete factorial_timing.xls
test = 0:max_test;
headers = {'n', 'Matlab Factorial', 'Loop', 'Vector', 'Recursion'};
writecell(headers,'factorial_timing.xls','Sheet','Integer Precision','WriteMode','append')
writecell(headers,'factorial_timing.xls','Sheet','Integer Timing','WriteMode','append')
for j = 1:length(test)
i = test(j);
data = ones(2,length(headers)) * 100;
data(:,1) = i;
data(:,2) = testFunc(@() factorial(i), i);
data(:,3) = testFunc(@() loop(i), i);
data(:,4) = testFunc(@() vector(i), i);
data(:,5) = testFunc(@() recursion(i), i);
writematrix(data(1,:),'factorial_timing.xls','Sheet','Integer Timing', 'WriteMode','append')
writematrix(data(2,:),'factorial_timing.xls','Sheet','Integer Precision', 'WriteMode','append')
disp(j / length(test) * 100)
end
max_test = 100;
test = 0:0.1:max_test;
headers = {'n', 'Matlab Gamma', 'Matlab Integral', 'Simpsons Rule', 'Stirlings Formula', 'Lanczos Approximation', 'Exact'};
writecell(headers,'factorial_timing.xls','Sheet','Real Precision','WriteMode','append')
writecell(headers,'factorial_timing.xls','Sheet','Real Timing','WriteMode','append')
for j = 1:length(test)
i = test(j);
data = ones(2,length(headers)) * 100;
data(:,1) = i;
data(:,2) = testFunc(@() gamma(i + 1), i);
data(:,3) = testFunc(@() integral(@(x) x.^i .* exp(-x),0,10e2), i);
data(:,4) = testFunc(@() simpsons_rule(i,0,1e3), i);
data(:,5) = testFunc(@() stirling(i), i);
data(:,6) = testFunc(@() lanczos(i), i);
syms f(x)
f(x) = x^i * exp(-x);
data(:,7) = testFunc(@() vpaintegral(f,0,1e3, 'RelTol', 1e-50, 'AbsTol', 0), i);
writematrix(data(1,:),'factorial_timing.xls','Sheet','Real Timing', 'WriteMode','append')
writematrix(data(2,:),'factorial_timing.xls','Sheet','Real Precision', 'WriteMode','append')
disp(j / length(test) * 100)
end
end
function [ret] = testFunc(func, n)
ret = [0 0];
syms f(x)
f(x) = x^n * exp(-x);
exactValue = vpaintegral(f,0,1e3, 'RelTol', 1e-50, 'AbsTol', 0);
evaluatedValue = func();
ret(2) = abs((evaluatedValue - exactValue) / exactValue) * 100;
trials = 1;
data = zeros(1,trials);
for i = 1:trials
data(i) = timeit(func);
end
ret(1) = mean(data);
end
clc
clear
clf
warning('off')
file = "factorial_timing.xlsx";
intPrec = readtable(file, "Sheet","Integer Precision");
intTime = readtable(file, "Sheet","Integer Timing");
realPrec = readtable(file, "Sheet","Real Precision");
realTime = readtable(file, "Sheet","Real Timing");
n = 10;
m = false;
hold on
title('Precision of Loop', 'FontSize', 20)
xlabel('n', 'FontWeight','bold', 'FontSize', 18)
xlim([0 100])
ax = gca;
ax.XAxis.FontSize = 15;
ax.YAxis.FontSize = ax.XAxis.FontSize;
ax.YAxis.Exponent = 0;
ylabel('Time (sec)', 'FontWeight','bold', 'FontSize', 18)
if m == true
intPrec = avTable(intPrec, 5);
intTime = avTable(intTime, 5);
realPrec = avTable(realPrec, 100);
realTime = avTable(realTime, 100);
end
plot(intPrec.n, intPrec.MatlabFactorial, 'LineWidth',3)
%legend({'Loop','LanczosApproximation', 'Matlab Gamma'},'Location','northwest')
disp('Done')
function ret = avTable(T, n)
var = T.Variables;
cols = length(var(1,:));
rows = length(var(:,1));
%n = ceil(rows / n);
for col = 1:cols
T.(col) = movmean(T.(col), n);
end
ret = T;
end
Precision
Timing (moving average of 5)
This method is using MATLAB's built in factorial function, the code of which is below, and seems to just find the max value in the inputted vector and calculate every factorial up to said value then gets the desired values and returns it. It calculates the factorials through repeated multiplication, although through a vector using cumprod(). MATLAB's own documentation also sites a maximum of 15 digits of accuracy for numbers larger then 21, which matches the measured precision.
function n = factorial(n)
%FACTORIAL Factorial function.
% FACTORIAL(N) for scalar N, is the product of all the integers from 1 to N,
% i.e. prod(1:N). When N is an N-D matrix, FACTORIAL(N) is the factorial for
% each element of N. Since double precision numbers only have about
% 15 digits, the answer is only accurate for N <= 21. For larger N,
% the answer will have the correct order of magnitude, and is accurate for
% the first 15 digits.
%
% Class support for input N:
% float: double, single
% integer: uint8, int8, uint16, int16, uint32, int32, uint64, int64
%
% See also PROD.
% Copyright 1998-2012 The MathWorks, Inc.
N = n(:);
if ~isreal(n) || any(fix(N) ~= N) || any(N < 0)
error(message('MATLAB:factorial:NNegativeInt'))
end
if isa(n,'double')
thres = 171;
elseif isa(n, 'single')
thres = 35;
elseif isinteger(n)
thres = 21;
else
error(message('MATLAB:factorial:unsupportedType'));
end
N = min(N,thres);
m = max(1, max(N));
Fa = cumprod([1 1 2:m]);
n(:) = Fa(N+1);
Precision
Timing (moving average of 5)
The loop implementation was very simple in how it works but was very quick and took a lot of effort to be able to beat it. It scales linearly as well as seems to have the same precision as the factorial() function. Thus it is reasonable to assume that the loop implementation is also exact up to 21.
function fact = loop(n)
fact = 1;
for i = 2:n
fact = fact * i;
end
end
Percision
Timing (moving average of 5)
I learned during the Prime Sieve deep dive and Simpson's Rule that occasionally MATLAB is faster when working with vectors then using a loop. However that isn't the case here, although there are some benefits like being a single line of code and probably scaling better then the loop implementation. However MATLAB will only calculate the factorial to 170 before outputting Inf, which is not enough for the vector implementation to be faster then the loop.
function fact = vector(n)
fact = prod(1:n);
end
Precision
Timing (moving average of 5)
Admittedly this was mostly for fun as well as to see how MATLAB handles recursion. It scales linearly as you would expect and has stuck with the trend of having the same precision as the previous implementations. It is also slower then the loop implementation which is what I would also expect.
function fact = recursion(n)
if (n <= 1)
fact = 1;
return
end
fact = n * recursion(n - 1);
end
Precision (moving average of 10)
Timing (moving average of 10)
This implementation is the first one to use the Gamma Function definition of the factorial which can take in any number and was done using MATLAB's built in gamma() function. As the code isn't freely available there is no code, however MATLAB's documentation cites
Cody, J., An Overview of Software Development for Special Functions, Lecture Notes in Mathematics, 506, Numerical Analysis Dundee, G. A. Watson (ed.), Springer Verlag, Berlin, 1976.
for there algorithm, which I believe I tracked down for the interested readers. To my limited understanding they have a different way for approximating based on different ranges to provide the most performance and precision. Which can be seen in the Timing Graph as the time steadily increases before stabilizing at around n = 10.
Precision
Timing (moving average of 10)
My first thought for why the gamma function is so quick was because MATLAB could calculate integrals quick thus I timed the integral() function. However that isn't true as the gamma function is both more accurate and faster then the integral function, although the precision can be increased. How exactly the function works is beyond me but the code is viewable for those interested.
n = 4;
fun = @(x) x.^n .* exp(-x);
integral(fun,0,10e2)
Precision (moving average of 5)
Timing (moving average of 10)
This implementation implements Simpson's Rule which similarly to the recursive implementation was for fun but was also a test to see if MATLAB is doing some sort of integral approximation. However after choosing a n value such that the function calculates 0.5! to 5 decimal places which is how much the other implementation's would display. It was obvious that this wasn't the solution. Even after slight optimizations from the previous project making it tailored to the specific integral rather then any function it was still far behind in terms of timing but is surprisingly close to the gamma function precision.
function fact = simpsons_rule(z, a, b)
n = 1211534;
deltaX = (b - a) / n;
values = linspace(a + deltaX, b - deltaX, n - 1);
y = values.^z .* exp(-values);
odd = 4 .* y(1:2:end);
even = 2 .* y(2:2:end);
fact = (deltaX / 3) * (a^z * exp(-a) + b^z * exp(-b) + sum(odd) + sum(even));
end
Precision
Timing (moving average of 10)
This implementation uses Stirling's Approximation which is a single equation for the factorial. The problem is that due to the complex math it is just barely faster then the loop implementation and also slower then the gamma function. It is also very inaccurate only providing a 2 digit accuracy, but does have the most satisfying precision graphs. However taking what I have learned, it could probably be made faster but is to inaccurate that it lacks any usefulness.
function fact = stirling(n)
fact = sqrt((2 * n + 1/3) * pi) * n^n * exp(-n);
end
This implementation is the main version I worked on in terms of making it quicker, more precise and over all learning the most about. This implementation uses Lanczos approximation in two functions, one was a attempt on calculating the coefficients used and the second actually implementing the part that calculates the factorial. Firstly a quick overview on the method, below are the main equations used
Where p1, p2, p3 and so on are the previously mentioned coefficients which can be pre calculated improving performance. Thus firstly lets discuss the code calculating the coefficients. I mainly followed the steps by Paul Godfrey in there example they made g = 5 and calculated 7 coefficients which I likewise will use, now below is the equation for the kth coefficient.
note that (2a - 1)!! is the Double Factorial of (2a-1) with the code for Fg(a) shown below
%Calculates Fg(a)
a = 0:n-1;
f = sqrt(2/sym(pi)) * (doubleFact(2*a-1).*my_exp(a+g+sym(1)/2))./(2.^a.*(a+g+1/2).^(a+1/2));
and does give Godfrey's answer
I also implemented my own e^x function and other tricks in order to get more precision which will be further discussed later. The next step is to calculate C(2k,2a) which are the even columns and even rows of the coefficients of the Chebyshev Polynomials which the code is below.
%Calculates Coefficients of the Chebyshev polynomials
%https://en.wikipedia.org/wiki/Lanczos_approximation
C = zeros(2*n-1,2*n-1);
%Gets the Chebyshev polynomials Coefficents
for N = 1:length(C)
for M = 1:length(C)
C(N,M) = T(N,M);
end
end
%Gets the even rows and columns of C
C = C(1:2:end, 1:2:end);
C(1,1) = 0.5;
This does give Godfrey's answer
Thus pk = C * Fg(a) through some math and re working we have that Ag(z) = Z * D * B * pk where
D is essentially the Sloane Sequence and B is the odd columns of the Binomial Coefficients as well as with alternating signs therefore
Therefore the coefficients are c = (D * B * C )* Fg(a), note that this is matrix multiplication not normal multiplication thus the coefficients for when g = 9 and
n = 11, which Godfrey found to be the most precise values, are below.
You may notice that my and Godfrey's values are slightly off despite my best efforts I couldn't figure out why they differ but to my knowledge Godfrey's values are the more accurate values thus I used those instead of my own when comparing the gamma functions.
%Calculates the coeefficients for the approximation
%Main Steps: https://www.numericana.com/answer/info/godfrey.htm
function p = lancoz_coef(n,g)
digits(128)
%Calculates Fg(a)
a = 0:n-1;
f = sqrt(2/sym(pi)) * (doubleFact(2*a-1).*my_exp(a+g+sym(1)/2))./(2.^a.*(a+g+1/2).^(a+1/2));
%Version 1 that was less precise
% a = 0:n-1;
% f = vpa(sqrt(2)/pi)*gamma(a-0.5+1).*(a+g+0.5).^-(a+0.5).*exp(a+g+0.5);
% disp(vpa(f(1)))
%Version 2 that was less precise
% f = zeros(1,n);
% for i = 0:n-1
% f(i+1) = vpa(sqrt(2)/pi)*halfGamma(i-0.5+1)*(i+g+0.5)^-(i+0.5)*vpa(exp(i+g+0.5));
% end
%Calculates Coefficients of the Chebyshev polynomials
%https://en.wikipedia.org/wiki/Lanczos_approximation
C = zeros(2*n-1,2*n-1);
%Gets the Chebyshev polynomials Coefficents
for N = 1:length(C)
for M = 1:length(C)
C(N,M) = T(N,M);
end
end
%Gets the even rows and columns of C
C = C(1:2:end, 1:2:end);
C(1,1) = 0.5;
%Calculates D the Sloane sequence
D = zeros(n);
D(1,1) = 1;
for i = 0:n-2
D(i+2,i+2) = -factorial(2*i+2) ./ (2*factorial(i).*factorial(i+1));
end
%Calculates the Binomial Coeffiecients
%https://en.wikipedia.org/wiki/Binomial_coefficient
B = zeros(n);
%Trivial first row
B(1,:) = 1;
%Creates a vector with alternating 1's and -1's
alt = (-1).^(0:n-1);
%Gets the rows of B which come from the odd columns of the Binomial
%Coefficents and alternates the sign
for i = 2:n
row = i+i-3:n+i-3;
col = (i-2)*2+1;
B(i,i:end) = alt(1:end-i+1).*factorial(row) ./ (factorial(col) .* factorial(row - col));
end
%Multiplyes the matricies to get a final answer for the
%coefficents
p = (D * B * C) * vpa(sym(f)');
disp(round(p,21))
end
%My implimantation of exp(n), intended to give more precise answers
%https://en.wikipedia.org/wiki/E_(mathematical_constant)
function exp = my_exp(n)
digits(128)
k = 300;
exp = zeros(1,length(n));
a = vpa(1:k);
fact = factorial(a);
for i = 1:length(n)
exp(i) = vpa(1 + sum((n(i).^a ./ fact)));
end
end
%Made to calculate the gamma function of a half, ex: 9.5, 6.5, 0.5
function fact = halfGamma(n)
digits(128)
if n == 0.5
fact = vpa(sqrt(pi));
return
else
d = vpa(n-1);
fact = d * halfGamma(n-1);
end
end
%Chebyshev Polynomial Coefficents through recursion
%https://en.wikipedia.org/wiki/Lanczos_approximation#Coefficients
function ret = T(n,m)
if n <= 1 && m == 1
ret = 1;
elseif n == 2 && m == 2
ret = 1;
elseif m == 1
ret = -T(n-2,1);
elseif m == n
ret = 2 * T(n-1,m-1);
elseif n >= 1
ret = 2 * T(n-1,m-1) - T(n-2,m);
else
ret = 0;
end
end
%Implements the double factorial
%https://mathworld.wolfram.com/DoubleFactorial.html
function fact = doubleFact(n)
fact = zeros(1, length(n));
for i = 1:length(n)
term = n(i);
if term == -1 || term == 0
fact(i) = 1;
elseif mod(term,2) == 0
fact(i) = prod(2:2:term);
else
fact(i) = prod(1:2:term);
end
end
end
Now having discussed the coefficient code we can now move on to the actual calculation that being calculating the factorial. Firstly a reminder on the new calculation
There are many ways to code this with some being quicker then others. Firstly lets look at the part right before Ag(z). I originally coded it as shown
temp = z+9.5;
sqrt(2 * pi)*temp^z*sqrt(temp)*exp(-temp);
in a attempt to reduce the amount of calculations and using built in functions I also learned that the slowest calculation is (z+9.5)^z because when z = 100 like my tests that would be 109.5^100 which is a very large calculation. That is when I came up with the idea to make the base constant ideally e as matlab has a built in function for that. After the following algebra
and further simplification as well as splitting it on other lines to time as well as a possible performance increase we have the final version
temp = z+9.5;
comm = sqrt(2 * pi * temp);
comm = comm * exp(z * log(temp) - temp);
we see that performance decreased from around 0.00013689 ms to 7.0418e-05 ms for 100! or a 64.1287% difference. The next significant optimization came from not using a loop as I originally had the coefficients in a vector and iterated through them as seen below with mild decrease in calculations
c = [ %1.000000000000000174663;
5716.400188274341379136;
-14815.30426768413909044;
14291.49277657478554025;
-6348.160217641458813289;
1301.608286058321874105;
-108.1767053514369634679;
2.605696505611755827729;
-0.7423452510201416151527e-2;
0.5384136432509564062961e-7;
-0.4023533141268236372067e-8];
fact = 1.000000000000000174663;
for i = 1:10
%fact = fact + c(i) / (z + i - 1);
temp = c(i) / (z + i);
temp = 1 / (z + i);
temp = temp * c(i);
fact = fact + temp;
end
fact = fact * comm;
however removing the loop and instead hard coding the iterations as seen below
fact = 1.000000000000000174663;
fact = fact + 5716.400188274341379136 / (z + 1);
fact = fact - 14815.30426768413909044 / (z + 2);
fact = fact + 14291.49277657478554025 / (z + 3);
fact = fact - 6348.160217641458813289 / (z + 4);
fact = fact + 1301.608286058321874105 / (z + 5);
fact = fact - 108.1767053514369634679 / (z + 6);
fact = fact + 2.605696505611755827729 / (z + 7);
fact = fact - 0.007423452510201416151527 / (z + 8);
fact = fact + 0.00000005384136432509564062961 / (z + 9);
fact = fact - 0.000000004023533141268236372067 / (z + 10);
fact = fact * comm;
we see that 8.5678e-05 ms now becomes 7.3558e-05 ms or around a 15.2227% difference.
%Uses the Lancoz Approximation
%About: https://en.wikipedia.org/wiki/Lanczos_approximation
%Main Steps: https://www.numericana.com/answer/info/godfrey.htm
function fact = lanczos(z)
%g = 9;
%n = 11;
% p = [1.000000000434457002999
% 5716.40018826882469642082
% -14815.304267303827193624659
% 14291.492770243227957671811
% -6348.160174678220466185036
% 1301.608146277464603149011
% -108.176491813501910571434
% 2.605616416699605525942
% -0.007600138516935476197
% 0.000223334913969033078
% -0.000077312348075020927];
% c = [ %1.000000000000000174663;
% 5716.400188274341379136;
% -14815.30426768413909044;
% 14291.49277657478554025;
% -6348.160217641458813289;
% 1301.608286058321874105;
% -108.1767053514369634679;
% 2.605696505611755827729;
% -0.7423452510201416151527e-2;
% 0.5384136432509564062961e-7;
% -0.4023533141268236372067e-8];
%p = double(lancoz_coef(n,g));
%Previous Version that is slower
%comm = sqrt(2 * pi)*temp^z*sqrt(temp)*exp(-temp);
%Originally used a loop but for speed did it manually
% for i = 1:10
% %fact = fact + c(i) / (z + i - 1);
% temp = c(i) / (z + i);
% temp = 1 / (z + i);
% temp = temp * c(i);
% fact = fact + temp;
% end
% fact = fact * comm;
temp = z+9.5;
comm = sqrt(2 * pi * temp);
comm = comm * exp(z * log(temp) - temp);
fact = 1.000000000000000174663;
fact = fact + 5716.400188274341379136 / (z + 1);
fact = fact - 14815.30426768413909044 / (z + 2);
fact = fact + 14291.49277657478554025 / (z + 3);
fact = fact - 6348.160217641458813289 / (z + 4);
fact = fact + 1301.608286058321874105 / (z + 5);
fact = fact - 108.1767053514369634679 / (z + 6);
fact = fact + 2.605696505611755827729 / (z + 7);
fact = fact - 0.007423452510201416151527 / (z + 8);
fact = fact + 0.00000005384136432509564062961 / (z + 9);
fact = fact - 0.000000004023533141268236372067 / (z + 10);
fact = fact * comm;
end
Precision (moving average of 10)
Timing (moving average of 10)
Gamma and Lanczos got moving average of 10
Loop got moving average of 5
Gamma and Lanczos got moving average of 10
Loop got moving average of 5
As one would expect I didn't revolutionize anything however I came very close to MATLAB's gamma function and firmly beat the Loop which I am very proud of.