Section 3.5
Viewing and Projection
This chapter on geometry finishes with a more complete discussion of projection and viewing transformations.
3.5.1 Perspective Projection
There are two general types of projection, perspective projection and orthographic projection. Perspective projection gives a realistic view. That is, it shows what you would see if the OpenGL display rectangle on your computer screen were a window into an actual 3D world (one that could extend in front of the screen as well as behind it). It shows a view that you could get by taking a picture of a 3D world with a camera. In a perspective view, the apparent size of an object depends on how far it is away from the viewer. Only things that are in front of the viewer can be seen. In fact, the part of the world that is in view is an infinite pyramid, with the viewer at the apex of the pyramid, and with the sides of the pyramid passing through the sides of the viewport rectangle.
However, OpenGL can't actually show everything in this pyramid, because of its use of the depth buffer to solve the hidden surface problem. Since the depth buffer can only store a finite range of depth values, it can't represent the entire range of depth values for the infinite pyramid that is theoretically in view. Only objects in a certain range of distances from the viewer are shown in the image that OpenGL produces. That range of distances is specified by two values, near and far. Both of these values must be positive numbers, and far must be greater than near. Anything that is closer to the viewer than the near distance or farther away than the far distance is discarded and does not appear in the rendered image. The volume of space that is represented in the image is thus a "truncated pyramid." This pyramid is the view volume:
The view volume is bounded by six planes -- the sides, top, and bottom of the pyramid. These planes are called clipping planes because anything that lies on the wrong side of each plane is clipped away.
In OpenGL, setting up the projection transformation is equivalent to defining the view volume. For a perspective transformation, you have to set up a view volume that is a truncated pyramid. A rather obscure term for this shape is a frustum, and a perspective transformation can be set up with the glFrustum command
gl.glFrustum(xmin,xmax,ymin,ymax,near,far)
The last two parameters specify the near and far distances from the viewer, as already discussed. The viewer is assumed to be at the origin, (0,0,0), facing in the direction of the negative z-axis. (These are "eye coordinates" in OpenGL.) So, the near clipping plane is at z = −near, and the far clipping plane is at z = −far. The first four parameters specify the sides of the pyramid: xmin, xmax, ymin, and ymax specify the horizontal and vertical limits of the view volume at the near clipping plane. For example, the coordinates of the upper-left corner of the small end of the pyramid are (xmin,ymax,−near). Note that although xmin is usually equal to the negative of xmax and ymin is usually equal to the negative of ymax, this is not required. It is possible to have asymmetrical view volumes where the z-axis does not point directly down the center of the view.
When the glFrustum method is used to set up the projection transform, the matrix mode should be set to GL_PROJECTION. Furthermore, the identity matrix should be loaded before calling glFrustum (since glFrustum modifies the existing projection matrix rather than replacing it, and you don't even want to try to think about what would happen if you combine several projection matrices into one). So, a use of glFrustum generally looks like this, leaving the matrix mode set to GL_MODELVIEW at the end:
gl.glMatrixMode(GL.GL_PROJECTION); gl.glLoadIdentity(); gl.glFrustum(xmin,xmax,ymin,ymax,near,far); gl.glMatrixMode(GL.GL_MODELVIEW);
This might be used in the init() method, at the beginning of the display() method, or possibly in the reshape() method, if you want to take the aspect ratio of the viewport into account.
The glFrustum method is not particularly easy to use. The GLU library includes the method gluPerspective as an easier way to set up a perspective projection. If glu is an object of type GLU then
glu.gluPerspective(fieldOfViewAngle, aspect, near, far);
can be used instead of glFrustum. The fieldOfViewAngle is the vertical angle between the top of the view volume pyramid and the bottom. The aspect is the aspect ratio of the view, that is, the width of the pyramid at a given distance from the eye, divided by the height at the same distance. The value of aspect should generally be set to the aspect ratio of the viewport. The near and far parameters have the same meaning as for glFrustum.
3.5.2 Orthographic Projection
Orthographic projections are comparatively easy to understand: A 3D world is projected onto a 2D image by discarding the z-coordinate of the eye-coordinate system. This type of projection is unrealistic in that it is not what a viewer would see. For example, the apparent size of an object does not depend on its distance from the viewer. Objects in back of the viewer as well as in front of the viewer are visible in the image. In fact, it's not really clear what it means to say that there is a viewer in the case of orthographic projection. Orthographic projections are still useful, however, especially in interactive modeling programs where it is useful to see true sizes and angles, unmodified by perspective.
Nevertheless, in OpenGL there is a viewer, which is located at the eye-coordinate origin, facing in the direction of the negative z-axis. Theoretically, a rectangular corridor extending infinitely in both directions, in front of the viewer and in back, would be in view. However, as with perspective projection, only a finite segment of this infinite corridor can actually be shown in an OpenGL image. This finite view volume is a parallelepiped -- a rectangular solid -- that is cut out of the infinite corridor by a near clipping plane and a far clipping plane. The value of far must be greater than near, but for an orthographic projection, the value of near is allowed to be negative, putting the "near" clipping plane behind the viewer, as it is in this illustration:
Note that a negative value for near puts the near clipping plane on the positive z-axis, which is behind the viewer.
An orthographic projection can be set up in OpenGL using the glOrtho method, which is generally called like this:
gl.glMatrixMode(GL.GL_PROJECTION); gl.glLoadIdentity(); gl.glOrtho(xmin,xmax,ymin,ymax,near,far); gl.glMatrixMode(GL.GL_MODELVIEW);
The first four parameters specify the x- and y-coordinates of the left, right, bottom, and top of the view volume. Note that the last two parameters are near and far, not zmin and zmax. In fact, the minimum z-value for the view volume is −far and the maximum z-value is −near. However, it is often the case that near = −far, and if that is true then the minimum and maximum z-values turn out to be near and far after all!
3.5.3 The Viewing Transform
To determine what a viewer will actually see in a 3D world, you have to do more than specify the projection. You also have to position the viewer in the world. That is done with the viewing transformation. Remember that the projection transformation is specified in eye coordinates, which have the viewer at the origin, facing down the negative direction of the z-axis. The viewing transformation says where the viewer really is, in terms of the world coordinate system, and where the viewer is really facing. The projection transformation can be compared to choosing which camera and lens to use to take a picture. The viewing transformation places the camera in the world and points it.
Recall that OpenGL has no viewing transformation as such. It has a modelview transformation, which combines the viewing transform with the modeling transform. While the viewing transformation moves the viewer, the modeling transformation moves the objects in the 3D world. The point here is that these are really equivalent operations, if not logically, then at least in terms of the image that is produced when the camera finally snaps a picture.
Suppose, for example, that we would like to move the camera from its default location at the origin back along the positive z-axis from the origin to the point (0,0,20). This operation has exactly the same effect as moving the world, and the objects that it contains, 20 units in the negative direction along the z-axis -- whichever operation is performed, the camera ends up in exactly the same position relative to the objects. It follows that both operations are implemented by the same OpenGL command, gl.glTranslatef(0,0,-20). More generally, applying any transformation to the camera is equivalent to applying the inverse, or opposite, of the transformation to the world. Rotating the camera to left has the same effect as rotating the world to the right. This even works for scaling: Imagine yourself sitting inside the camera looking out. If the camera shrinks (and you along with it) it will look to you like the world outside is growing -- and what you see doesn't tell which is really happening. Suppose that we use the commands
gl.glRotatef(90,0,1,0); gl.glTranslatef(10,0,0);
to establish the viewing transformation. As a modeling transform, these commands would first translate an object 10 units in the positive x-direction, then rotate the object 90 degrees about the y-axis. An object that was originally at the origin ends up on the negative z-axis; the object is then directly in front of the viewer, at a distance of ten units. If we consider the same transformation as a viewing transform, the effect on the viewer is the inverse of the effect on the world. That is, the transform commands first rotate the viewer −90 degrees about the y-axis, then translate the viewer 10 units in the negative x-direction. This leaves the viewer on the negative x-axis at (−10,0,0), looking at the origin. An object at the origin will then be directly in front of the viewer, at a distance of 10 units. Under both interpretations of the transformation, the relationship of the viewer to the object is the same in the end.
Since this can be confusing, the GLU library provides a convenient method for setting up the viewing transformation:
glu.gluLookAt( eyeX,eyeY,eyeZ, refX,refY,refZ, upX,upY,upZ );
This method places the camera at the point (eyeX,eyeY,eyeZ), looking in the direction of the point (refX,refY,refZ). The camera is oriented so that the vector (upX,upY,upZ) points upwards in the camera's view. This method is meant to be called at the beginning of the display() method to establish the viewing transformation, and any further transformations that are applied after that are considered to be part of the modeling transformation.
The Camera class that I wrote for the glutil package combines the functions of the projection and viewing transformations. For an object of type Camera, the method
camera.apply(gl);
is meant to be called at the beginning of the display method to set both the projection and the view. The viewing transform will be established with a call to glu.gluLookAt, and the projection will be set with a call to either glOrtho or glFrustum, depending on whether the camera is set to use orthographic or perspective projection. The parameters that will be used in these methods must be set before the call to camera.apply by calling other methods in the Camera object. The method
camera.setView(eyeX,eyeY,eyeZ, refX,refY,refZ, upX,upY,upZ );
sets the parameters that will be used in a call to glu.gluLookAt. In the default settings, the eye is at (0,0,30), the reference point is the origin (0,0,0), and the up vector is the y-axis. The method
camera.setLimits(xmin,xmax,ymin,ymax,zmin,zmax)
is used to specify the projection. The parameters do not correspond directly to the parameters to glOrtho or glFrustum. They are specified relative to a coordinate system in which the reference point (refX,refY,refZ) has been moved to the origin, the up vector (upX,upY,upZ) has been rotated onto the positive y-axis, and the viewer has been placed on the positive z-axis, at a distance from the origin equal to the distance between (eyeX,eyeY,eyeZ) and (refX,refY,refZ). (These are almost standard eye coordinates, except that the viewer has been moved some distance backwards along the positive z-axis.) In these coordinates, zmin and zmax specify the minimum and maximum z-values for the view volume, and xmin, xmax, ymin, and ymax specify the left-to-right and bottom-to-top limits of the view volume on the xy-plane, that is, at z = 0. (The x and y limits might be adjusted, depending on the configuration of the camera, to match the aspect ratio of the viewport.) Basically, you use the camera.setLimits command to establish a box around the reference point (refX,refY,refZ) that you would like to be in view.
3.5.4 A Simple Avatar
In all of our sample programs so far, the viewer has stood apart from the world, observing it from a distance. In many applications, such as 3D games, the viewer is a part of the world and gets to move around in it. With the right viewing transformation and the right user controls, this is not hard to implement. The sample program WalkThroughDemo.java is a simple example where the world consists of some random shapes scattered around a plane, and the user can move among them using the keyboard's arrow keys. Here is an applet version; you will have to click the applet to direct keyboard input to it:
The viewer in this program is represented by an object of the class SimpleAvatar, which I have added to the glutil package. A SimpleAvatar represents a point of view that can be rotated and moved. The rotation is about the viewer's vertical axis and is controlled in the applet by the left and right arrow keys. Pressing the left-arrow key rotates the viewer through a positive angle, which the viewer perceives as turning towards the left. Similarly, the right-arrow key rotates the viewer towards the right. The up-arrow and down-arrow keys move the viewer forward or backward in the direction that the viewer is currently facing. The motion is parallel to the xz-plane; the viewer's height above this plane does not change.
In terms of the programming, the viewer is subject to a rotation about the y-axis and a translation. These transformations are applied to the viewer in that order (not the reverse -- a rotation following a translation to the point (x,y,z) would move the viewer away from that point). However, this viewing transform is equivalent to applying the inverse transform to the world and is implemented in the code as
gl.glLoadIdentity(); gl.glRotated(-angle,0,1,0); gl.glTranslated(-x,-y,-z);
where angle is the rotation applied to the viewer and (x,y,z) is the point to which the viewer is translated. This code can be found in the apply() method in the SimpleAvatar class, and that method is meant to be called in the display method, before anything is drawn, to establish the appropriate projection and viewing transformations to show the world as seen from the avatar's point of view.
3.5.5 Viewer Nodes in Scene Graphs
For another example of the same idea, we can return to the idea of scene graphs, which were introduced in Subsection 2.1.5. A scene graph is a data structure that represents the contents of a scene. But if we truly want to make our viewer part of the scene, then there should be a way for the viewer to be part of the data for the scene, that is part of the scene graph. We would like to be able to represent the viewer as a node in a scene graph, in the same way that a cube or sphere can be represented by a node. This would mean that the viewer could be subjected to transformations just like any other node in the scene graph. And it means that a viewer can be part of a complex, hierarchical model. The viewer might be the driver in a moving car or a rider on an amusement park ride.
A scene is a hierarchical structure in which complex objects can be built up out of simpler objects, and transformations can be applied to objects on any level of the hierarchy. The overall transformation that is finally applied to an object consists of the product of all the transformations from the root of the scene graph to the object. If we place a viewer into a scene graph, then the viewer should be subject to transformation in exactly the same way. For example, if the viewer is part of a complex object representing a car, then the viewer should be subject to exactly the same transformation as the car and this should allow the viewer to turn and move along with the car.
However, the viewer is not quite the same as other objects in the scene graph. First of all, the viewer is not a visible object. It could be "attached" to a visible object, but the viewer we are talking about is really a point of view, represented by a projection and viewing transformation. A second point, which is crucial, is that the viewer's projection and viewing transformation have to be established before anything is drawn. This means that we can't simply traverse the scene graph and implement the viewer node when we come to it -- the viewer node has to be applied before we even start traversing the scene graph. And while there can be several viewer nodes in a scene graph, there can only be one view at a time. There has to be one active viewer whose view is shown in the rendered image. Of course, it's possible to switch from one viewer to another and to redraw the image from the new point of view.
The package simplescenegraph3d contains a very simple implementation of scene graphs for 3D worlds. One of the classes in this package is AvatarNode, which represents exactly the type of viewer node that I have been discussing. An AvatarNode can be added to a scene graph and transformed just like any other node. It can be part of a complex object, and it will be carried along with that object when the object is transformed.
Remember that to apply a transformation to a viewer, you have to apply the inverse of that transformation to the world. When the viewer is represented by an AvatarNode in a scene graph, the transformation that we want to apply to the viewer is the product of all the transformations applied to nodes along a path from the root of the scene graph to the AvatarNode. To apply the inverse of this transformation, we need to apply the inverses of these transformations, in the opposite order. To do this, we can start at the AvatarNode and walk up the path in the scene graph, from child node to parent node until we reach the top. Along the way, we apply the inverse of the transformation for each node. To make it possible to navigate a scene graph in this way, each node has a parent pointer that points from a child node to its parent node. There is also a method applyInverseTransform that applies the inverse of the transform in the node. So, the code in the AvatarNode class for setting up the viewing transformation is:
SceneNode3D node = this; while (node != null) { node.applyInverseTransform(gl); node = node.parent; }
An AvatarNode has an apply method that should be called at the beginning of the display method to set up the projection and viewing transformations that are needed to show the world from the point of view of that avatar. The code for setting up the viewing transformation is in the apply method.
Here is an applet that uses this technique to show you a moving scene from several possible viewpoints. A pop-up menu below the scene allows the user to select one of three possible points of view, including two views that are represented by objects of type AvatarNode in a scene graph. (The third view is the familiar global view in which the user can rotate the scene using the mouse.) The source code for the program is MovingCameraDemo.java.