Section 3.1
Vectors, Matrices, and Homogeneous Coordinates
Before moving on with OpenGL, we look at the mathematical background in a little more depth. The mathematics of computer graphics is primarily linear algebra, which is the study of vectors and linear transformations. A vector, as we have seen, is a quantity that has a length and a direction. A vector can be visualized as an arrow, as long as you remember that it is the length and direction of the arrow that are relevant, and that its specific location is irrelevant. If we visualize a vector V as starting at the origin and ending at a point P, then we can to a certain extent identify V with P -- at least to the extent that both V and P have coordinates, and their coordinates are the same. For example, the 3D point (x,y,z) = (3,4,5) has the same coordinates as the vector (dx,dy,dz) = (3,4,5). For the point, the coordinates (3,4,5) specify a position in space in the xyz coordinate system. For the vector, the coordinates (3,4,5) specify the change in the x, y, and z coordinates along the vector. If we represent the vector with an arrow that starts at the origin (0,0,0), then the head of the arrow will be at (3,4,5).
The distinction between a point and a vector is subtle. For some purposes, the distinction can be ignored; for other purposes, it is important. Often, all that we have is a sequence of numbers, which we can treat as the coordinates of either a vector or a point at will.
Matrices are rectangular arrays of numbers. A matrix can be used to apply a transformation to a vector (or to a point). The geometric transformations that are so important in computer graphics are represented as matrices.
In this section, we will look at vectors and matrices and at some of the ways that they can be used. The treatment is not very mathematical. The goal is to familiarize you with the properties of vectors and matrices that are most relevant to OpenGL.
3.1.1 Vector Operations
We assume for now that we are talking about vectors in three dimensions. A 3D vector can be specified by a triple of numbers, such as (0,1,0) or (3.7,−12.88,0.02). Most of the discussion, except for the "cross product," carries over easily into other dimensions.
One of the basic properties of a vector is its length. In terms of its coordinates, the length of a vector (x,y,z) is given by sqrt(x2+y2+z2). (This is just the Pythagorean theorem in three dimensions.) If v is a vector, its length can be denoted by |v|. The length of a vector is also called its norm.
Vectors of length 1 are particularly important. They are called unit vectors. If v = (x,y,z) is any vector other than (0,0,0), then there is exactly one unit vector that points in the same direction as v. That vector is given by
( x/length, y/length, z/length )
where length is the length of v. Dividing a vector by its length is said to normalize the vector: The result is a unit vector that points in the same direction as the original vector.
Given two vectors v1 = (x1,y1,z1) and v2 = (x2,y2,z2), the dot product of v1 and v2 is denoted by v1·v2 and is defined by
v1·v2 = x1*x2 + x2*y2 + z1*z2
Note that the dot product is a number, not a vector. The dot product has several very important geometric meanings. First of all, note that the length of a vector v is just the square root of v·v. Furthermore, the dot product of two non-zero vectors v1 and v2 has the property that
cos(angle) = v1·v2 / (|v1|*|v2|)
where angle is the measure of the angle from v1 to v2. In particular, in the case of unit vectors, whose lengths are 1, the dot product of two unit vectors is simply the cosine of the angle between them. Furthermore, since the cosine of a 90-degree angle is zero, two non-zero vectors are perpendicular if and only if their dot product is zero. Because of these properties, the dot product is particularly important in lighting calculations, where the effect of light shining on a surface depends on the angle that the light makes with the surface.
The dot product is defined in any dimension. For vectors in 3D, there is another type of product called the cross product, which also has an important geometric meaning. For vectors v1 = (x1,y1,z1) and v2 = (x2,y2,z2), the cross product of v1 and v2 is denoted v1×v2 and is the vector defined by
v1×v2 = ( y1*z2 - z1*y2, z1*x2 - x1*z2, x1*y2 - y1*x2 )
If v1 and v2 are non-zero vectors, then v1×v2 is zero if and only if v1 and v2 point in the same direction or in exactly opposite directions. Assuming v1×v2 is non-zero, then it is perpendicular both to v1 and to v2; furthermore, the vectors v1, v2, v1×v2 follow the right-hand rule; that is, if you curl the fingers of your right hand from v1 to v2, then your thumb points in the direction of v1×v2. If v1 and v2 are unit vectors, then the cross product v1×v2 is also a unit vector, which is perpendicular both to v1 and to v2.
Finally, I will note that given two points P1 = (x1,y1,z1) and P2 = (x2,y2,z2), the difference P2−P1 which is defined by
P2 − P1 = ( x2 − x1, y2 − y1, z2 − z1 )
is a vector that can be visualized as an arrow that starts at P1 and ends at P2. Now, suppose that P1, P2, and P3 are vertices of a polygon. Then the vectors P1−P2 and P3−P2 lie in the plane of the polygon, and so the cross product
(P3−P2) × (P1−P2)
is either zero or is a vector that is perpendicular to the polygon. This fact allows us to use the vertices of a polygon to produce a normal vector to the polygon. Once we have that, we can normalize the vector to produce a unit normal. (It's possible for the cross product to be zero. This will happen if P1, P2, and P3 lie on a line. In that case, another set of three vertices might work. Note that if all the vertices of a polygon lie on a line, then the polygon degenerates to a line segment and has no interior points at all. We don't need unit normals for such polygons.)
3.1.2 Matrices and Transformations
A matrix is just a two-dimensional array of numbers. Suppose that a matrix M has r rows and c columns. Let v be a c-dimensional vector, that is, a vector of c numbers. Then it is possible to multiply M by v to yield another vector, which will have dimension r. For a programmer, it's probably easiest to define this type of multiplication with code. Suppose that we represent M and v by the arrays
double[][] M = new double[r][c]; double[] v = new double[c];
Then we can define the product w = M*v as follows:
double w = new double[r]; for (int i = 0; i < r; i++) { w[i] = 0; for (int j = 0; j < c; j++) { w[i] = w[i] + M[i][j] * v[j]; } }
If you think of a row, M[i], of M as being a c-dimensional vector, then w[i] is simply the dot product M[i]·v.
Using this definition of the multiplication of a vector by a matrix, a matrix defines a transformation that can be applied to one vector to yield another vector. Transformations that are defined in this way are called linear transformations, and they are the main object of study in the field of mathematics known as linear algebra.
Rotation, scaling, and shear are linear transformations, but translation is not. To include translations, we have to widen our view to include affine transformations. An affine transformation can be defined, roughly, as a linear transformation followed by a translation. For computer graphics, we are interested in affine transformations in three dimensions. However -- by what seems at first to be a very odd trick -- we can narrow our view back to the linear by moving into the fourth dimension.
Note first of all that an affine transformation in three dimensions transforms a vector (x1,y1,z1) into a vector (x2,y2,z2) given by formulas
x2 = a1*x1 + a2*y1 + a3*z1 + t1 y2 = b1*x1 + b2*y1 + b3*z1 + t2 z2 = c1*x1 + c2*y1 + c3*z1 + t3
These formulas express a linear transformation given by multiplication by the 3-by-3 matrix
a1 a2 a3 b1 b2 b3 c1 c2 c3
followed by translation by t1 in the x direction, t2 in the y direction and t3 in the z direction. The trick is to replace each three-dimensional vector (x,y,z) with the four-dimensional vector (x,y,z,1), adding a "1" as the fourth coordinate. And instead of the 3-by-3 matrix, we use the 4-by-4 matrix
a1 a2 a3 t1 b1 b2 b3 t2 c1 c2 c3 t3 0 0 0 1
If the vector (x1,y1,z1,1) is multiplied by this 4-by-4 matrix, the result is the precisely the vector (x2,y2,z2,1). That is, instead of applying the affine transformation to the 3D vector (x1,y1,z1), we can apply a linear transformation to the 4D vector (x1,y1,z1,1).
This might seem pointless to you, but nevertheless, that is what OpenGL does: It represents affine transformations as 4-by-4 matrices, in which the bottom row is (0,0,0,1), and it converts three-dimensional vectors into four dimensional vectors by adding a 1 as the final coordinate. The result is that all the affine transformations that are so important in computer graphics can be implemented as matrix multiplication.
One advantage of using matrices to represent transforms is that matrices can be multiplied. In particular, if A and B are 4-by-4 matrices, then their matrix product A*B is another 4-by-4 matrix. Each of the matrices A, B, and A*B represents a linear transformation. The important fact is that applying the single transformation A*B to a vector v has the same effect as first applying B to v and then applying A to the result. Mathematically, this can be said very simply: (A*B)*v = A*(B*v). For computer graphics, it means that the operation of following one transform by another simply means multiplying their matrices. This allows OpenGL to keep track of a single modelview matrix, rather than a sequence of individual transforms. Transform commands such as glRotatef and glTranslated are implemented as matrix multiplication -- the current modelview matrix is multiplied by a matrix representing the transform that is being applied, yielding a matrix that represents the combined transform. You might compose your modelview transform as a long sequence of modeling and viewing transforms, but when the transform is actually applied to a vertex, only a single matrix multiplication is necessary. The matrix that is used represents the entire sequence of transforms, all multiplied together. It's really a very neat system.
3.1.3 Homogeneous Coordinates
There is one transformation in computer graphics that is not an affine transformation: In the case of a perspective projection, the projection transformation is not affine. In a perspective projection, an object will appear to get smaller as it moves farther away from the viewer, and that is a property that no affine transformation can express.
Surprisingly, we can still represent a perspective projection as a 4-by-4 matrix, provided we are willing to stretch our use of coordinates even further than we have already. We have already represented 3D vectors by 4D vectors in which the fourth coordinate is 1. We now allow the fourth coordinate to be anything at all. When the fourth coordinate, w, is non-zero, we consider the coordinates (x,y,z,w) to represent the three-dimensional vector (x/w,y/w,z/w). Note that this is consistent with our previous usage, since it considers (x,y,z,1) to represent (x,y,z), as before. When the fourth coordinate is zero, there is no corresponding 3D vector, but it is possible to think of (x,y,z,0) as representing a 3D "point at infinity" in the direction of (x,y,z).
Coordinates (x,y,z,w) used in this way are referred to as homogeneous coordinates. If we use homogeneous coordinates, then any 4-by-4 matrix can be used to transform three-dimensional vectors, and among the transformations that can be represented in this way is the projection transformation for a perspective projection. And in fact, this is what OpenGL does internally. It represents three-dimensional points and vectors using homogeneous coordinates, and it represents all transformations as 4-by-4 matrices. You can even specify vertices using homogeneous coordinates. For example, the command
gl.glVertex4d(x,y,z,w);
generates the 3D point (x/w,y/w,z/w). Fortunately, you will almost never have to deal with homogeneous coordinates directly. The only real exception to this is that homogeneous coordinates are required when setting the position of a light, as we'll see in the next chapter.
3.1.4 Vector Forms of OpenGL Commands
Some OpenGL commands take parameters that are vectors (or points, if you want to look at it that way) given in the form of arrays of numbers. For some commands, such glVertex and glColor, vector parameters are an option. Others, such as the commands for setting material properties, only work with vectors.
Commands that take parameters in array form have names that end in "v". For example, you can use the command gl.glVertex3fv(A,offset) to generate a vertex that is given by three numbers in an array A of type float[]. The second parameter is an integer offset value that specifies the starting index of the vector in the array. For example, you might use an array to store the coordinates of the three vertices of a triangle -- say (1,0,0), (−1,2,2), and (1,1,−3) -- and use that array to draw the triangle:
float[] vert = new float[] { 1, 0, 0, -1, 2, 2, 1, 1, -3}; gl.glBegin(GL.GL_POLYGON); gl.glVertex3fv(vert, 0); // Equivalent to gl.glVertex3f(vert[0],vert[1],vert[2]). gl.glVertex3fv(vert, 3); // Equivalent to gl.glVertex3f(vert[3],vert[4],vert[5]). gl.glVertex3fv(vert, 6); // Equivalent to gl.glVertex3f(vert[6],vert[7],vert[8]). gl.glEnd();
(This will make a lot more sense if you already have the vertex data in an array for some reason.)
Similarly, there are methods such as glVertex2dv and glColor3fv. The color command can be useful when working with Java Color objects, since a color object c has a method c.getColorComponents(null), that returns an array containing the red, green, blue, and alpha components of the color as float values in the range 0.0 to 1.0. (The null parameter tells the method to create and return a new array; the parameter could also be an array into which the method would place the data.) You can use this method and the command gl.glColor3fv or gl.glColor4fv to set OpenGL's color from the Java Color object c:
gl.glColor3fv( c.getColorComponents(null), 0);
(I should note that the versions of these commands in the OpenGL API for the C programming language do not have a second parameter. In C, the only parameter is a pointer, and you can pass a pointer to any array element. If vert is an array of floats, you can call glVertex3fv(vert) when the vertex is given by the first three elements of the array. You can call glVertex3fv(&vert[3]), or equivalently glVertex3fv(vert+3), when the vertex is given by the next three array elements, and so on. I should also mention that Java has alternative forms for these methods that use things called "buffers" instead of arrays; these forms will be covered later in the chapter.)
OpenGL has a number of methods for reading the current values of OpenGL state variables. Many of these values can be retrieved using four generic methods that take an array as a parameter:
gl.glGetFloatv( propertyCodeNumber, destinationArray, offset ); gl.glGetDoublev( propertyCodeNumber, destinationArray, offset ); gl.glGetIntegerv( propertyCodeNumber, destinationArray, offset ); gl.glGetBooleanv( propertyCodeNumber, destinationArray, offset);
In these methods, the first parameter is an integer constant such as GL.GL_CURRENT_COLOR or GL.GL_MODELVIEW_MATRIX that specifies which state variable you want to read. The second parameter is an array of appropriate type into which the retrieved value of the state variable will be placed. And offset tells the starting index in the array where the data should be placed; it will probably be 0 in most cases. (For glGetBooleanv, the array type is byte[], and the numbers 0 and 1 are used to represent the boolean values false and true.) For example, to retrieve the current color, you could say
float[] saveColor = new float[4]; // Space for red,blue,green, and alpha. gl.glGetFloatv( GL.GL_CURRENT_COLOR, saveColor, 0 );
You might use this to save the current color while you do some drawing that requires changing to another color. Later, you could restore the current color to its previous state by saying
gl.glColor4fv( saveColor, 0 );
You could use glGetDoublev instead of glGetFloatv, if you prefer. OpenGL will convert the data to the type that you request. However, as usual, float values might be the most efficient. There are many, many state variables that you can read in this way. I will mention just a few more as examples. You can read the current viewport with
int[] viewport = new int[4]; // Space for x, y, width, height. gl.glGetIntegerv( GL.GL_VIEWPORT, viewport, 0 );
For boolean state variables such as whether lighting is currently enabled, you can call glGetBooleanv with an array of length 1:
byte[] lit = new byte[1]; // Space for the single return value gl.glGetBooleanv( GL.GL_LIGHTING, lit );
I might mention that if you want to retrieve and save a value so that you can restore it later, there are better ways to do so, which will be covered later in the chapter.
Perhaps most interesting for us now, you can retrieve the current transformation matrices. A transformation matrix is represented by a one-dimensional array of length 16. The 4-by-4 transformation matrix is stored in the array in column-major order, that is, the entries in the first column from top to bottom, followed by the entries in the second column, and so on. You could retrieve the current modelview transformation with
float[] transform = new float[16]; gl.glGetFloatv( GL.GL_MODELVIEW, transform, 0 );
It is possible to set the transformation matrix to a value given in the same form, using the glLoadMatrixf method. For example, you could restore the modelview matrix to the value that was retrieved by the previous command using
gl.glLoadMatrixf( transform, 0 );
But again, if you just want to save a transform so that you can restore it later, there is a better way to do it -- in this case by using glPushMatrix and glPopMatrix.
Just as you can multiply the current transform matrix by a rotation matrix by calling glRotatef or by a translation matrix by calling glTranslatef, you can multiply the current transform matrix by an arbitrary matrix using gl.glMultMatrixf(matrix,offset). As with the other matrix methods, the matrix is given by a one-dimensional array of 16 floats. One situation in which this can be useful is to do shear transformations. For example, consider a shear that transforms the vector (x1,y1,z1) to the vector (x2,y2,z2) given by
x2 = x1 + s * z1 y2 = y1 z2 = z1
where s is a constant shear amount. The 4-by-4 matrix for this transformation is hard to construct from scaling, rotation, and translation, but it is very simple:
1 0 s 0 0 1 0 0 0 0 1 0 0 0 0 1
To use this shear transformation as a modeling transform, you can use glMultMatrixf as follows:
float[] shear = new float[] { 1,0,0,0, 0,1,0,0, s,0,1,0, 0,0,0,1 }; gl.glMultMatrixf( shear, 0 );