Purpose - I was watching "CPU vs GPU: Which is More Powerful?" by Dave's Garage and learned how the Sieve of Eratosthenes works and thought that it fit MATLAB's strengths of vectors and matrixes perfectly. That then started a quest for optimization and seeing how quickly I can make the code.
clear
clc
clf
close all
format long
%3000000 will take around 1021.326412 sec and have 216816 primes
tic
max = 3000000;
stoppingPoint = ceil(sqrt(max));
allNumbers = linspace(2,max, max - 1);
testNumbers = linspace(2, stoppingPoint, stoppingPoint - 1);
multiplyNumbers = reshape(allNumbers(1:end), [], 1);
removeNumber = multiplyNumbers .* testNumbers;
primeNumbers = setdiff(allNumbers, removeNumber);
toc
disp(length(primeNumbers))
disp("Done")
The first attempt was aiming to get a vector of all non prime numbers then make a new vector excluding these values. However I learned that, one I didn't implement the getting a vector of non primes well as I thought a for loop was inefficient so used a matrix that ended up being max by sqrt(max) large, second setdiff() is very slow.
clear
clc
clf
close all
format long
%3000000 will take around 7.434608 sec and have 216816 primes
tic
max = 3000000;
stoppingPoint = ceil(sqrt(max));
numbers = linspace(2,max, max - 1);
for i = 2:stoppingPoint
remove = i:i:max;
remove(1) = [];
numbers = setdiff(numbers, remove);
end
toc
disp(length(numbers))
disp("Done")
The main goal with this one was to not do the multiplying matrixes as that was both slow and used a lot of memory. This was accomplished by using a for loop which I tried to originally avoid however ended up being quicker. However I was still using setdiff() which is still very slow.
clear
clc
clf
close all
format long
tic
%3000000 will take around 14.014116 sec and have 216816 primes
max = 3000000;
numbers = linspace(1,max, max);
primes = isprime(numbers);
numbers = numbers(primes);
toc
disp(length(numbers))
disp("Done")
Feeling very accomplished at going from 1000 sec to 7 sec I tried using MATLAB's built in isprime() function. Which to my surprise was significantly slower, I have since learned of primes() which is quicker then Version 7, but at the time thought isprime() was the only built-in function that could be used for a prime sieve.
clear
clc
clf
close all
format long
%3000000 will take around 2.191731 sec and have 216816 primes
tic
max = 10000000;
stoppingPoint = ceil(sqrt(max));
numbers = linspace(2,max, max - 1);
for i = 2:stoppingPoint
numbers(numbers ~= i & mod(numbers, i) == 0) = [];
end
toc
disp(length(numbers))
disp("Done")
In this version I finally moved away from setdiff() into essentially an if statement. However their were a few optimizations that can still be done.
clear
clc
clf
close all
format long
%3000000 will take around 0.620596 sec and have 216816 primes
tic
max = 3000000;
stoppingPoint = floor(sqrt(max));
%Creates a vector of all odd numbers
numbers = 1:2:max;
%Adds 2 and removes 1
numbers(1) = 2;
%Creates Vector of all test values
test = 3:2:stoppingPoint;
%Goes through each test value
while (not(isempty(test)))
remove = test(1);
%Removes all values evenly divisible by the selected test value
numbers(numbers ~= remove & mod(numbers, remove) == 0) = [];
test(mod(test, remove) == 0) = [];
end
toc
disp(length(numbers))
disp("Done")
The original idea was to move from removing the values to adding the correct values as that is generally quicker then removing. However I couldn't get a solution that could work. Thus the main optimization ended up being in reducing the number of iterations for the for loop. This was done by using a floor() on sqrt(max) as it is up to the sqrt(max) but doesn't need to include, as well as only testing the odd numbers. The bigger optimization was also removing values from test as you already removed all values of 9 with 3 so you don't need to bother with 9.
I now started researching prime sieves and realized that you don't need modulo but instead can remove every 2nd or 3rd value. Then using a logical vector you can record all prime numbers then at the end remove all non prime numbers. However I can still also filter test to reduce the number of iterations, which I tried but looked like it would be too much work so it became it's own version.
clear
clc
%3000000 will take around 0.034346 sec and have 216816 primes
tic
max = 3000000;
stoppingPoint = floor(sqrt(max));
numbers = 1:2:max;
numbers(1) = 2;
isPrime = true(1, length(numbers));
test = 3:2:stoppingPoint;
while(not(isempty(test)))
i = test(1);
start = i - ceil(i / 2) + 1;
remove = start:i:length(numbers);
remove(1) = [];
isPrime(remove) = 0;
test(mod(test, i) == 0) = [];
end
numbers(not(isPrime)) = [];
toc
disp(length(numbers))
disp("Done")
As mentioned before this version successfully filtered test on each iteration decreasing the total number of iterations as well as only doing odd numbers. However I can do the same logical vector method with test as well, so I did.
clear
clc
%3000000 will take around 0.028370 sec and have 216816 primes
tic
max = 3000000;
stoppingPoint = floor(sqrt(max));
numbers = 1:2:max;
numbers(1) = 2;
isPrime = true(1, length(numbers));
test = 3:2:stoppingPoint;
isTest = true(1, length(test));
while(any(isTest))
i = test(find(isTest, 1));
start = i - ceil(i / 2) + 1;
remove = start:i:length(numbers);
removeEnd = ceil(length(test) / i);
testRemove = remove(1:removeEnd);
remove = remove(2:end);
isPrime(remove) = 0;
%remove = (start - 1):i:length(test);
testRemove = testRemove - 1;
isTest(testRemove) = 0;
end
numbers(~isPrime) = [];
toc
disp(length(numbers))
disp("Done")
I did succeed on using a logical vector for removing the values from test however I am unsure on how much quicker, if any, version 7 is at the very least not slower though.
clear
clc
max = 3000000;
%3000000 will take around 0.007245 sec and have 216816 primes
tic
numbers = primes(max);
toc
disp(length(numbers))
disp("Done")
Thinking I hit the limit and learning that the primes() function exists I wanted to see how quick it is. After realizing that it is very quick, around 10x quicker, I looked at the code and got some ideas for a version 8.
function p = primes(n)
%PRIMES Generate list of prime numbers.
% PRIMES(N) is a row vector of the prime numbers less than or
% equal to N. A prime number is one that has no factors other
% than 1 and itself.
%
% Class support for input N:
% float: double, single
% integer: uint8, int8, uint16, int16, uint32, int32, uint64, int64
%
% See also FACTOR, ISPRIME.
% Copyright 1984-2013 The MathWorks, Inc.
if ~isscalar(n)
error(message('MATLAB:primes:InputNotScalar'));
elseif ~isreal(n)
error(message('MATLAB:primes:ComplexInput'));
end
if n < 2
p = zeros(1,0,class(n));
return
elseif isfloat(n) && n > flintmax(class(n))
warning(message('MATLAB:primes:NGreaterThanFlintmax'));
n = flintmax(class(n));
end
n = floor(n);
p = true(1,double(ceil(n/2)));
q = length(p);
if (isa(n,'uint64') || isa(n,'int64')) && n > flintmax
ub = 2.^(nextpow2(n)/2); %avoid casting large (u)int64 to double
else
ub = sqrt(double(n));
end
for k = 3:2:ub
if p((k+1)/2)
p(((k*k+1)/2):k:q) = false;
end
end
p = cast(find(p)*2-1,class(n));
p(1) = 2;
clear
clc
%3000000 will take around 0.007600 sec and have 216816 primes
tic
max = 3000000;
stoppingPoint = floor(sqrt(max));
isPrime = true(1, ceil(max / 2));
isTest = true(1, ceil(stoppingPoint / 2));
isTest(1) = false;
while any(isTest)
index = find(isTest, 1);
i = 2 * index - 1;
isPrime(index:i:length(isPrime)) = false;
isPrime(index) = true;
isTest(index:i:length(isTest)) = false;
end
numbers = find(isPrime) * 2 - 1;
numbers(1) = 2;
toc
disp(length(numbers))
%all(numbers == primes(max))
disp("Done")
After looking at the primes() code I learned that updating logical values is quicker then number values. Thus the main numbers vector was removed, as well as find() isn't as bad as I originally thought so I used it more. That said it is now apparent that me also filtering test is starting to slow the code down, at least it is probably slower then an if statement.
clear
clc
%3000000 will take around 0.006841 sec and have 216816 primes
tic
max = 3000000;
stoppingPoint = floor(sqrt(max));
isPrime = true(1, ceil(max / 2));
for i = 3:2:stoppingPoint
if(isPrime((i + 1) / 2))
isPrime((1.5 * i + 0.5):i:length(isPrime)) = false;
end
end
numbers = find(isPrime) * 2 - 1;
numbers(1) = 2;
toc
disp(length(numbers))
%test = find(~isprime(numbers));
%all(numbers == primes(max))
disp("Done")
Admittedly this is basically the primes() function because after trying what they did for some reason it got faster. One change was putting the vector in the parenthesis of isPrime(), which for some reason is quicker. Another is doing 1.5 * i + 0.5 as the starting index as this removes the need to change isPrime() at i. But the big change was moving to a for loop with an if statement and removing the test vector entirely. I still want to see if I can remove the if statement and reduce iteration count but I am basically tied with primes() so don't think that would help.
Having made 11 different Versions I now wanted to graph them and see how they compare and how much they scale as the number to calculate all primes to increases.
clear
clc
%format longG
start = 1;
ending = 50000;
factor = 100;
collection = cell(factor, 2);
for i = start:ending
versionFunction = @() version5(i);
time = timeit(versionFunction);
if (mod(i, factor) ~= 0)
collection(mod(i, factor), :) = {i, time};
else
collection(factor, :) = {i, time};
end
if (i == ending || mod(i, factor) == 0)
writecell(collection,'TimerData.txt','WriteMode','append', ...
'Delimiter','tab')
collection = {};
disp(i / ending * 100)
end
end
disp("Done")
function primeNumbers = version1(max)
stoppingPoint = ceil(sqrt(max));
allNumbers = linspace(2,max, max - 1);
testNumbers = linspace(2, stoppingPoint, stoppingPoint - 1);
multiplyNumbers = reshape(allNumbers(1:end), [], 1);
removeNumber = multiplyNumbers .* testNumbers;
primeNumbers = setdiff(allNumbers, removeNumber);
end
function numbers = version2(max)
stoppingPoint = ceil(sqrt(max));
numbers = linspace(2,max, max - 1);
for i = 2:stoppingPoint
remove = i:i:max;
remove(1) = [];
numbers = setdiff(numbers, remove);
end
end
function numbers = version3(max)
stoppingPoint = ceil(sqrt(max));
numbers = linspace(2,max, max - 1);
for i = 2:stoppingPoint
numbers(numbers ~= i & mod(numbers, i) == 0) = [];
end
end
function numbers = version4(max)
stoppingPoint = floor(sqrt(max));
%Creates a vector of all odd numbers
numbers = 1:2:max;
%Adds 2 and removes 1
numbers(1) = 2;
%Creates Vector of all test values
test = 3:2:stoppingPoint;
%Goes through each test value
while (not(isempty(test)))
remove = test(1);
%Removes all values evenly divisible by the selected test value
numbers(numbers ~= remove & mod(numbers, remove) == 0) = [];
test(mod(test, remove) == 0) = [];
end
end
function numbers = version5(max)
stoppingPoint = floor(sqrt(max));
numbers = 1:1:max;
isPrime = true(1, max);
isPrime(1) = 0;
for i = 2:stoppingPoint
remove = i:i:max;
remove(1) = [];
isPrime(remove) = 0;
end
numbers(not(isPrime)) = [];
end
function numbers = version6(max)
stoppingPoint = floor(sqrt(max));
numbers = 1:2:max;
numbers(1) = 2;
isPrime = true(1, length(numbers));
test = 3:2:stoppingPoint;
while(not(isempty(test)))
i = test(1);
start = i - ceil(i / 2) + 1;
remove = start:i:length(numbers);
remove(1) = [];
isPrime(remove) = 0;
test(mod(test, i) == 0) = [];
end
numbers(not(isPrime)) = [];
end
function numbers = version7(max)
stoppingPoint = floor(sqrt(max));
numbers = 1:2:max;
numbers(1) = 2;
isPrime = true(1, length(numbers));
test = 3:2:stoppingPoint;
isTest = true(1, length(test));
while(any(isTest))
i = test(find(isTest, 1));
start = i - ceil(i / 2) + 1;
remove = start:i:length(numbers);
removeEnd = ceil(length(test) / i);
testRemove = remove(1:removeEnd);
remove = remove(2:end);
isPrime(remove) = 0;
%remove = (start - 1):i:length(test);
testRemove = testRemove - 1;
isTest(testRemove) = 0;
end
numbers(~isPrime) = [];
end
function numbers = version8(max)
stoppingPoint = floor(sqrt(max));
isPrime = true(1, ceil(max / 2));
isTest = true(1, ceil(stoppingPoint / 2));
isTest(1) = false;
while any(isTest)
index = find(isTest, 1);
i = 2 * index - 1;
isPrime(index:i:length(isPrime)) = false;
isPrime(index) = true;
isTest(index:i:length(isTest)) = false;
end
numbers = find(isPrime) * 2 - 1;
numbers(1) = 2;
end
function numbers = version9(max)
stoppingPoint = floor(sqrt(max));
isPrime = true(1, ceil(max / 2));
for i = 3:2:stoppingPoint
if(isPrime((i + 1) / 2))
isPrime((1.5 * i + 0.5):i:length(isPrime)) = false;
end
end
numbers = find(isPrime) * 2 - 1;
numbers(1) = 2;
end
function numbers = usingIsPrime(max)
numbers = linspace(1,max, max);
primes = isprime(numbers);
numbers = numbers(primes);
end
function numbers = usingPrimes(max)
numbers = primes(max);
end
Problem - Since I wanted to reduce the noise in the graph I wanted to average a amount of numbers and graph that instead to see any trends. As this isn't easy to do with built in function I used programming languages instead. For the graphs of All Versions I used Google Apps Script and as Google Sheets can't handle a million numbers well I used Excel for the Last Few Versions graph, thus I used MATLAB instead.
const ss = SpreadsheetApp.getActiveSpreadsheet()
const average = ss.getSheetByName("Average")
const data = ss.getSheetByName("Data")
/**
* Takes a range and returns the amount if values averaged
* @param {array} the array to average
* @param (int) the amount to average
* @return {array} the averaged array
* @customfunction
*/
function rangeAverage(arr, amount) {
var average = []
console.log(arr)
for (var i = 0; i < arr.length; i+=amount) {
sum = 0
count = 0
for(var j = i; (j < (i + amount)) && (j < arr.length); j++) {
test = arr[j][0]
//console.log('Test: ' + test)
if (typeof test == 'number') {
sum += test
count++
}
}
//console.log('Sum: ' + sum)
//console.log('Count: ' + count)
if (sum != 0 && count != 0) {
average.push(sum / count)
}
}
return average
}
function onEdit() {
factor = 100
allData = ss.getRange("Data!A2:L50001").getValues()
for(var i = 0; i < allData[0].length; i++) {
timeData = allData.map(j => [j[i]])
console.log(timeData)
timeData = rangeAverage(timeData, factor)
average.getRange(2, i + 1, timeData.length, 1).setValues(timeData.map(j => [j]))
}
//console.log(allData)
}
clear
clc
format longG
tic
factor = 2000;
data = readtable("Time Data Last Few To 1 Million.xlsx", 'Sheet','Data', ...
'Range', "A:D", 'VariableNamingRule','preserve');
average = table();
for i = 1:width(data)
count = 1;
columnName = strcat('Average_', ...
cell2mat(data.Properties.VariableNames(i)));
for start = 1:factor:height(data)
ending = start + factor - 1;
if (ending > height(data))
values = data(start:end, i);
else
values = data(start:ending, i);
end
values = table2array(values);
sectionAverage = mean(values);
average(count,columnName) = {sectionAverage};
count = count + 1;
end
end
writetable(average, "Time Data Last Few To 1 Million.xlsx", ...
'Sheet','Average')
disp("Done")
toc