06 Coordinate Systems and Transforms

Last time, I discussed how to render with a VertexBuffer. But what we rendered used transformed coordinates, which is to say it was up to us to figure out where every vertex went on the screen. If we’re writing a typical Direct3D application, though, that’s going to be a ton of work – imagine all the code you’d have to write to render an animation of a tank firing a shell, updating each of the thousands of vertices that make up the scene and doing all the math to ensure that they wound up in the right place on the screen.

Obviously, this is the sort of thing that a common library should make easier, and Direct3D does exactly that. To understand how to take advantage of Direct3D capabilities, however, we need to understand about coordinate systems and transformations.

Simply put, a coordinate system is just a way of describing points in space. There are many different possible coordinate systems – you may have heard of Cartesian coordinates, spherical coordinates, cylindrical coordinates or others. These can all be useful in Direct3D, and often come into play in real-world applications. If – like me – you’re just getting started with Direct3D, there’s one that you’re more likely to run across than any other: the three-dimensional left-handed Cartesian coordinate system.

This system is pretty straightforward. It identifies points in three dimensions using an x, y, and z coordinate. All of the axes are perpendicular (properly speaking I should say orothogonal) to each other. The x coordinate increases horizontally to the right. The y coordinate increases vertically upward. And the z coordinate increases “into” the screen, away from the viewer. This last bit is important – most scientific and engineering textbooks that you might have used in school usually use a right-handed coordinate system, where z increases towards the viewer. In computer graphics, this is reversed to give a natural meaning to coordinates – the z value now becomes a measure of how far away something is.

A picture of the difference between left- and right-handed systems is shown here:

The hands show an easy mnemonic to help you remember which is which: In a left-handed system, you use your left hand to curl your fingers so as to “fold” the x axis onto the y axis, and your thumb will point along the positive z axis. In a right-handed system, you use your right hand.

During the course of rendering a scene, there are a number of steps. The most convenient way to represent an object differs between each of these steps. For example, if I were considering a box (say I was modeling a treasure chest in a game), when defining the vertices that make up that box, it would probably be most convenient to think of one of the corners as being the origin, and the other corners being at coordinates like (0, 0, 3), (0, 5, 0) and (3, 5, 0). Certainly that’s going to make things easier than if the origin is way off in the distance somewhere, and the vertices have coordinates like (1.234, 17.85, 1123908.85) and (452.23, 1423.123, 17.972).

The coordinate system that’s convenient for a given object is called the local coordinate system, local space, or object space.

There’s another coordinate system that’s convenient for the scene as a whole. For example, imagine that we have a room with a treasure chest, a table, and a lamp in it. Most likely, we’d want a coordinate system that was different than the local coordinate system for the chest. We might want one that had it’s origin in the corner of the room, and whose axes lined up with the walls and floor. That would make it easy to express the position of any given object: “The chest is three meters right, level with the floor, and two meters in” would translate to (3, 0, 2).

We call this coordinate system world coordinates or world space. The idea is that it’s a coordinate system that expresses the position of things in the world. All objects in the scene have their own set of local coordinates, but they share world coordinates.

At this point we have to consider the problem of going between coordinate systems. After all, let’s say we wanted to put three treasure chests in the room – unless they’re all at the same place (unlikely), we’re going to need a way to morph the local coordinates that were so convenient for defining the box into three different sets of world coordinates so that everything has a position relative to everything else.

The tool that we use in both mathematics and in computer graphics for moving between coordinate systems is the matrix. You need to know about these if you want to do Direct3D. I won’t apologize for this or try to tell you that you can skate by without knowing anything – you can’t unless you want to be stuck writing completely trivial stuff. Fortunately it’s not particularly difficult math – go Google “Introduction Matrices” and you’ll turn up about a half-million hits that will explain the basics. I’m going to assume you’ve done so from here on out.

If we pick our matrix elements carefully, multiplying that matrix with a set of coordinates in one coordinate space turns them into the equivalent coordinates in another system. Direct3D refers to this as a transform, and there are several important ones. In that sense, the matrix is a transform. The transform that helps us locate our object in world space (turns the object’s local coordinates into world coordinate) is the world transform.

As it turns out, matrices in Direct3D are represented by the Microsoft.DirectX.Matrix structure. Note that this is in the Microsoft.DirectX namespace, not the Microsoft.DirectX.Direct3D namespace, because matrices are used by other members of the DirectX API family too.

Fortunately for us, the Matrix structure has lots of helper methods that save us from having to do the slightly hairy math of figuring out the appropriate matrix coefficients. Usually. For example, we could get a matrix that represented a 45-degree rotation around the y axis by calling

Matrix worldTxfm = Matrix.RotationY(Math.PI / 4);

Note that the argument is an angle specified in radians, not degrees. It’s easy: 180 degrees equals π radians.

There are two more transforms that you’ll commonly encounter when writing Direct3D applications. These are the view transform and the projection transform. Let’s consider the view transform first.

While object space is a coordinate system that is convenient to the objects in the scene, certain operations that need to be performed are more easily expressed in view space. This is sometimes called camera space, because the space is defined as having it’s origin at the camera (or eye) point. That is, something at (0, 0, 0) is co-located with the viewer.

In addition to always putting the origin at the camera point, view space orients the z axis to point in the same direction as the camera, and the y axis to point “up”. We can get a matrix that represents the transformation necessary to change world coordinates into viewspace coordinates using the Matrix helper method LookAtLH.

Matrix viewTxfm = Matrix.LookAtLH( new Vector3(0, 0, -5), // Camera located 5 units “out” // of the screen new Vector3(), // Looking at the origin new Vector3(0, 1, 0) // Up is in the positive y // direction );

LookAtLH takes three arguments: the position of the camera, the position to look, and the direction for “up”. All of these coordinates are specified with the Vector3 structure, and all of them are specified in world coordinates. This is very, very convenient, since it gives us a very natural way to view the scene – we can simply consider the “camera” to be another object in the scene, expressing its position and orientation in the coordinate system as the tanks, bullets, explosions, and chain-gun toting Smurfs that we’re rendering. Oh, and in case you hadn’t guessed, the LH stands for left-handed.

View space is convenient for graphical operations that have to do with the camera (such as z-buffering, a topic we’ll cover later) but it’s not very useful for drawing pixels on the screen, which is our ultimate goal. To turn objects in three-dimensional view space into coordinates in two-dimensional screen space, we use the projection transform. (For an interesting discussion of differnt sorts of projection transforms, read here.) The Matrix structure has a helper method for this as well:

Matrix projTxfm = Matrix.PerspectiveFovLH( (float)Math.PI/4.0F, // Standard field of view 1.0F, // Aspect ratio 1.0F, 10.0F // Clip objects > 10 // or < 1 unit from the camera );

PerspectiveFovLH takes four arguments: a field of view, an aspect ratio, and a pair of distances that define the clipping planes.

The field of view essentially indicates what sort of lens we’re using: wide-angle or narrow-angle. Larger numbers here will put more and more of the “world” on the screen, but will result in more and more distortion. Smaller numbers mean less distortion, but less and less of the world will be visible. Generally, it’s best to stick with a value of π/4 radians.

The aspect ratio indicates how “wide-screen” the rendering will be. A value of 1.0 indicates that we’re trying to render a square image. A value of 0.5 would mean that we wanted to render an image that was half as tall as it is wide, similar to a movie screen or HDTV. Note that regardless of the aspect ratio,Direct3D will always stretch the image to fit the actual window we’re drawing into here, so there will be some distortion unless the aspect ratio matches the actual height-to-width ratio of the screen.

The last two arguments specify the clipping planes. Anything closer to the camera than the first distance, or farther away than the second distance, will not be drawn. Note that these distances are with respect to the camera. If an object is partially within and partially outside of these boundaries, it will be clipped. That is, you’ll only see that part of the object that falls between these two distances.

It turns out to be somewhat important to try to get the clipping planes as close as possible to the near and far extents of the scene. It’s tempting to just set them at some really small value and some really high value, but this tends to screw up things like z-buffering. No one said cool 3D graphics was easy!

What I haven’t shown you is how to make use of all this stuff in a program that actually draws pixels on the screen. But with this background under our belts, that won’t take much code; I’ll show you how to do it next time.

Page updated

Google Sites

Report abuse