Reducing divisions when trying to use implicit-equation triangle drawing is perfectly possible. Let’s see…
3d to 2d coordinates conversion:
The usual implicit equations are:
So, the obvious approach is to first convert 3d to 2d and then use the 2d coords… but what if we used the 3d coords directly?… lets substitute a bit…
… fill out missing factors while keeping the same value…
… work a bit on c0…
… OMG!!! WE ARE DIVIDING ALWAYS BY THE SAME FACTOR!!!…
Let’s recall a bit… these (a,b,c) factors are per-edge and we don’t really need the value, just the sign… this means that we don’t have to care about the w factor, since it’s the same on every place!!! SO, reacalling a bit, to calc the edges for a poly, we just need to do:
lo’ and behold… we need no stinking divides anywhere!!!
It’s funny how it works… while watching the 4k compos from breakpoint 2008 I was talking with wisefox about trying to eliminate divisions from the polygon drawing setups…
To recap, let’s forget for a moment about texturing and just try to get which pixels are to be drawn and which are to be left untouched…
Input data is 3 vertexes in 3d:
Next we have to transform from 3d to 2d, which takes at 3 divs and 6 muls:
But we can be a bit sneaky about how we calculate the dividing:
This form needs 1 divs and 20 muls… but there are a lot of common parts that can be factored, and end up with 1 divs and 13 muls.
So, if a div takes A cycles and a mul takes B cycles…
We are carrying a lot of divs and muls so it is better to do all of this on the FPU to conserve some appearance of precision… on a 68060 a division takes 37 cycles and a multiplication 3 cycles, so lets count…
The new method takes only 58% of the time of the original one!
Ok, so we have reduced the number of calculations for calculating the 2d projection… next time we can see how to reduce the time needed for the edge deltas!
After calculating the 3d to 2d projections, there are two parts of a mapper that need some common values: back face culling and calculating texture deltas.
Back face culling works by measuring the screen space “signed area” of a polygon: the same absolute value can be positive for front facing polygons and negative for back facing polygons.
That way when we detect this value as negative, we can simply stop drawing the polygon.
The classic formula using 2d coordinates is something like this:
We can substitute for the original 3d coordinates:
And make all parts have the same denominator:
And try to push out the denominator zzz:
There are 2 important things here:
So in previous parts we have calculated the 2d coords and whether the polygon is facing towards us. That’s good, but we can’t draw a polygon without knowing which pixels are inside and which are outside…
For that, the usual way was to get the staring and ending point of each edge, and calculate a ratio on their differences.
Let’s take for example the edge from vertex 0 to vertex 1.
First we have to get the difference along X and Y axis:
Then, we calculate the ratio of them:
And then, we can do a loop to calculate all the X coords for each Y:
Which can be calculated by starting with:
and using a loop like this:
But we have a big problem: each triangle has 3 sides, and each of them needs a division… is there any way to reduce that?
Lets recap… we start with the original formula:
And go back to using the original 3d values:
Then, we substitute for the intermediate values used on previous parts:
And we can push out and eliminate www:
All fine and dandy, we can use that last formula as a template for the other edges (nice patterns on the letters hehe):
We first define some short hand terms:
And substitute:
We are very near the end!
First expand across all values:
And then factor the common divisor:
Wow, the letter patterns are starting to hurt my head but it sure has its payoff: we are down from 6 to 2 divides per triangle!!!
So, we have figured out how to optimise a lot, but still are without any means to calculate texture deltas… lets try to do the calculations for linear texturing!
One of the basic formula for texture deltas is this one:
As you can see, the patterns area similar to the calculations for the area, and in fact the denominator is the are itself, so lets substitute:
We need to correlate v0, v1 and v2 to the zzz divider which is being used all over the place, so we define this:
And we can substitute just like we did when calculating the area formula:
Push out the zzz divides:
And notice that both num and area are being divided by (zzz*zzz), so it can be eliminated:
Extending this code to calculate perspective correct gradients is easy, by using these values for the starting points:
And these formula for the numerator on the deltas using the same principles as in other parts:
note: delta formulas taken from kalms/tbl page for texture gradients
I had a revelation today. It may not work, or be totally irrelevant, but that’s not important for me at the moment. The idea itself is cool, and that is worth writing about.
When doing perspective correct mapping, you have to interpolate u/z and 1/z horizontally and perform (u/z)/(1/z) every so often. That division is usually done in a pipelined manner so that we continue performing mapping while it is calculated, so in effect it may be “free”.
But we can also “accumulate” all these divisions and perform them in one go, reusing the same technique we used on the first part to compute the 3d-to-2d projections for the three triangle vertices.
Let’s play an example in a scanline that has 4 control points:
We can calculate the values we need in the old way:
But we can also factor out the divisions like this:
The result, as usual, is trading divisions for many multiplies.
Now some parts of the audience will say “But I’m already pipelining my perspective divides with my texturing, what can I gain from this technique?”.
As I said at the start, you could gang all the usual divides you’d do in a scanline into just one, which would mean a lot of work into calculating all the different numerators. Other way would be to gang only 2 consecutive divides, which is easier on the code and be a very simple way to achieve 8 pixel steps instead of 16 pixel ones.
I feel there are a lot of things which can be improved by generalizing this technique.