In this article, we will replace our immobile 2D triangle with a spinning 3D cube. The end result will look like this:
Now that we’re finally getting something interesting on the screen, I can include more pictures! An album of animated gifs is available here: http://imgur.com/a/x8q7R
In order to make this spinning cube, we will learn a bit about matrix math, and how it is used to create perspective projections, rotation, translation, and the concept of a “camera.” We will also learn about depth buffering, and why it is necessary. We will also see how a typical 3D application implements changes over time, such as animation.
Download all lessons as a zip from here: https://github.com/tomdalling/opengl-series/archive/master.zip
Setup instructions are available in the first article: Getting Started in Xcode, Visual C++, and Linux.
This article builds on the code from the previous article.
All the code in this series of articles is available from github: https://github.com/tomdalling/opengl-series. You can download a zip of all the files from that page, or you can clone the repository if you are familiar with git. The code for this article can be found in the windows/03_matrices
, osx/03_matrices
, and linux/03_matrices
directories.
This article is mostly about using matrices in 3D, so let’s start with a bit of matrix theory before getting into the code. I’m not going to focus much on the math, as there are lots of good online resources for that. We are using GLM to perform all the math for us. I will be focusing on what matrices can do for us in our 3D application.
Matrices are used to transform 3D coordinates.
Matrices are used to transform 3D coordinates. Possible transformations include (click for animated example):
A matrix is a grid/table of numbers, like this:
The plural of matrix is “matrices” (pronounced “mey-tri-seez“). For example, “I have one matrix” and “I have have two matrices.”
Different numbers produce different types of transformations. The matrix above will rotate coordinates 90° around the Z axis. We will be using GLM to create our matrices, so you do not need to understand how to calculate the numbers.
Matrices can have any number of rows and columns, but for 3D transformations we need a 4×4 matrix like the one shown above. Whenever I say “matrix,” I mean a 4×4 matrix.
When it comes to code, a matrix is usually represented as an array of floats. We will be using the glm::mat4
class to represent a 4×4 matrix.
matrix × matrix = combined matrix
matrix × coord = transformed coord
The two most important matrix operations to understand are:
When you multiply two matrices, the product is a new matrix that does both of the transformations.
For example, if you multiply a rotation matrix by a translation matrix, the result is a “combined” matrix that does both rotation and translation. Below is an example of this type of matrix multiplication:
Unlike normal multiplication, order is important in matrix multiplication.
Unlike normal multiplication, order is important in matrix multiplication. For example, if A
and B
are matrices, then A × B
is not necessarily equal to B × A
. The multiplication below is the same as the one above, but with the order changed:
Notice how the result is different if the order is changed. The animations below show why order is important. They perform the same transformations, but in a different order. The transformations are: a translation up the Y axis, and a 45° rotation.
If, while you are coding, you see that a transformation is wrong, then go back and check that your matrix multiplication is in the correct order.
When you multiply a matrix by a coordinate, the product is a new coordinate that has the transformation applied to it.
For example, if you take the rotation matrix above and multiply it by the coordinate (1,1,0) the result is the “transformed” coordinate (-1,1,0). The transformed coordinate is the original coordinate after being rotated 90° around the Z axis. Below is an example of this type of matrix multiplication:
You will notice that the coordinates above are 4D, not 3D. They are in the format:
We need to use 4D coordinates because 4×4 matrices require them in order to do matrix multiplication.
So why are we using 4D coordinates? Well, we require 4×4 matrices in order to do all the 3D transformations that we need. However, matrix multiplication requires that the number of columns in the left-hand side is equal to the number of rows on the right-hand side. This means that a 4×4 matrix can not be multiplied with a 3D coordinate, because the matrix has four columns but the coordinate only has three rows. We need to use 4D coordinates because 4×4 matrices require them in order to do matrix multiplication.
Some transformations, such a rotation and scaling, only require a 3×3 matrix. For those transformations, we don’t need 4D coordinates because 3D will work. However, translation requires at least a 4×3 matrix, and perspective projection requires a 4×4 matrix, and we need both of those, so we are forced to use 4D.
These are called homogenous coordinates. In a later article, when we cover directional lighting, we will learn about what the ‘W’ dimension actually represents. At this point, however, we only need to know how to convert from 3D to 4D. When you convert a 3D coordinate into 4D, always set the 4^{th} dimension, ‘W’, to 1. For example, the coordinate (22,33,44) will look like:
When you convert a 4D coordinate back to 3D, if the ‘W’ dimension is still set to 1 then you can just ignore it and use the X, Y and Z values like you normally would. If you find that your ‘W’ value has changed to something other than 1, well… you’re either doing something advanced, or you have a bug.
The first change to the code will be to replace the triangle from the previous article with a cube.
I’ve made the cube out of triangles, by using two triangles on each of the six sides. In older versions of OpenGL, we could have used one square (GL_QUADS
) per side instead of two triangles, but GL_QUADS
have been removed from modern versions of OpenGL. The X, Y and Z coordinates will range from -1 to +1, which means the cube will be two units wide, and the center of the cube will be the origin (the origin is coordinate (0,0,0)). We will use one 256×256 texture for all sides of the cube. The data is going to be in the same format as the last article, so we won’t need to change much. Here is the new data for the cube:
GLfloat vertexData[] = { // X Y Z U V // bottom -1.0f,-1.0f,-1.0f, 0.0f, 0.0f, 1.0f,-1.0f,-1.0f, 1.0f, 0.0f, -1.0f,-1.0f, 1.0f, 0.0f, 1.0f, 1.0f,-1.0f,-1.0f, 1.0f, 0.0f, 1.0f,-1.0f, 1.0f, 1.0f, 1.0f, -1.0f,-1.0f, 1.0f, 0.0f, 1.0f, // top -1.0f, 1.0f,-1.0f, 0.0f, 0.0f, -1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f,-1.0f, 1.0f, 0.0f, 1.0f, 1.0f,-1.0f, 1.0f, 0.0f, -1.0f, 1.0f, 1.0f, 0.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, // front -1.0f,-1.0f, 1.0f, 1.0f, 0.0f, 1.0f,-1.0f, 1.0f, 0.0f, 0.0f, -1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,-1.0f, 1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f, -1.0f, 1.0f, 1.0f, 1.0f, 1.0f, // back -1.0f,-1.0f,-1.0f, 0.0f, 0.0f, -1.0f, 1.0f,-1.0f, 0.0f, 1.0f, 1.0f,-1.0f,-1.0f, 1.0f, 0.0f, 1.0f,-1.0f,-1.0f, 1.0f, 0.0f, -1.0f, 1.0f,-1.0f, 0.0f, 1.0f, 1.0f, 1.0f,-1.0f, 1.0f, 1.0f, // left -1.0f,-1.0f, 1.0f, 0.0f, 1.0f, -1.0f, 1.0f,-1.0f, 1.0f, 0.0f, -1.0f,-1.0f,-1.0f, 0.0f, 0.0f, -1.0f,-1.0f, 1.0f, 0.0f, 1.0f, -1.0f, 1.0f, 1.0f, 1.0f, 1.0f, -1.0f, 1.0f,-1.0f, 1.0f, 0.0f, // right 1.0f,-1.0f, 1.0f, 1.0f, 1.0f, 1.0f,-1.0f,-1.0f, 1.0f, 0.0f, 1.0f, 1.0f,-1.0f, 0.0f, 0.0f, 1.0f,-1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,-1.0f, 0.0f, 0.0f, 1.0f, 1.0f, 1.0f, 0.0f, 1.0f };
Now we need to change the call to glDrawArrays
inside the Render
function, because it is set to draw only the first triangle. There are six sides to a cube, each side has two triangles, and each triangle has three vertices, so the total number of vertices to draw is: 6 × 2 × 3 = 36. The new glDrawArrays
call looks like this:
glDrawArrays(GL_TRIANGLES, 0, 6*2*3);
Lastly, we will use a different picture called “wooden-crate.jpg” as the texture. We just need to change the file name in the LoadTexture
function, like this:
tdogl::Bitmap bmp = tdogl::Bitmap::bitmapFromFile(ResourcePath("wooden-crate.jpg"));
That’s it! We’ve provided all the data necessary to make a textured cube. If you run the program now, you will see this:
At this point, we have a couple of problems. Firstly, the cube is looking awfully 2D because we can only see one side of it. We need to “move the camera” so we are looking at the cube from a different angle. Secondly, something is wrong because the width and height of the cube should be the same, but the width is obviously larger than the height in the screenshot above. In order to fix these problems, we will have to learn a bit about matrices, and how they are used in 3D applications.
To understand how to make a “camera” in 3D, we must first understand the concept of the clip volume.
The clip volume is a cube. Whatever is inside the clip volume appears on the screen, and anything outside the clip volume is not visible.
The clip volume is a cube. Whatever is inside the clip volume appears on the screen, and anything outside the clip volume is not visible. It has the exact same size as the cube we made above. It ranges from -1 to +1 on the X, Y and Z axes. -X is left, +X is right, -Y is bottom, +Y is top, +Z is away from the camera, and -Z is toward the camera.
Because our cube is the exact same size as the clip volume, all we can see is the front side of the cube.
This also explains why our cube looks wider than it is tall. The window displays whatever is in the clip volume. The left and right edges of the window are -1 and +1 on the X axis, the bottom and top edges of the window are -1 and +1 on the Y axis. The clip volume gets stretched to fit the size of the viewport in the window, so our cube doesn’t look square anymore.
The clip volume can not be changed. It is always the same size, in the same position. So, instead of moving the camera, we must move the entire 3D scene so that it fits inside the clip volume cube correctly.
We want to make a camera that can move around, look in different directions, and maybe zoom in and out. However, the clip volume can not be changed. It is always the same size, and in the same position. So, instead of moving the camera, we must move the entire 3D scene so that it fits inside the clip volume cube correctly. For example, if we want to rotate the camera to the right, we actually rotate the whole world to the left. If we want to move the camera closer to the player, we actually move the player closer to the camera. This is how “cameras” work in 3D, they transform the entire world so that it fits into the clip volume and looks correct.
When you walk somewhere, it feels like the world is standing still, and you are moving. But you can also imagine that you are not moving at all, and the whole world is rotating underneath your feet, like you are on a treadmill. This is the difference between “moving the camera” and “moving the world.” Either way, it looks exactly the same to the viewer.
So how do we transform the 3D scene to fit into the clip volume? This is where we need to use matrices.
Let’s implement the camera matrix first. As explained above “cameras” in 3D are just transformations that affect the whole 3D scene. Because a camera is a transformation, we can represent the camera as a matrix.
Firstly, we need to include a GLM header that will allow us to create different kinds of matrices.
#include <glm/gtc/matrix_transform.hpp>
Next, we need to update the vertex shader. We will make the camera matrix a variable called camera
, and every vertex will be multiplied by the camera matrix. This is how we will transform the entire 3D scene. Every single vertex will be transformed by the camera matrix. The new vertex shader looks like this:
#version 150 uniform mat4 camera; //this is the new variable in vec3 vert; in vec2 vertTexCoord; out vec2 fragTexCoord; void main() { // Pass the tex coord straight through to the fragment shader fragTexCoord = vertTexCoord; // Transform the input vertex with the camera matrix gl_Position = camera * vec4(vert, 1); }
Now we need to set the camera
shader variable from our C++ code. At the bottom of the LoadShaders
function, we will add this code:
gProgram->use(); glm::mat4 camera = glm::lookAt(glm::vec3(3,3,3), glm::vec3(0,0,0), glm::vec3(0,1,0)); gProgram->setUniform("camera", camera); gProgram->stopUsing();
The camera matrix isn’t going to change in this article, so we set it once, just after the shaders are created.
You can’t set shader variables unless the shader is in use, which is why we have gProgram->use()
and gProgram->stopUsing()
.
We are going to use the glm::lookAt
function to create a camera matrix for us. If you have used older versions of OpenGL, then you may have used the gluLookAt
function to achieve the same thing, but gluLookAt
has been removed from recent versions of OpenGL. The first argument glm::vec3(3,3,3)
is the position of the camera. The second argument glm::vec3(0,0,0)
is what the camera is looking at. The center of the cube is at (0,0,0) so that is where we want the camera to look. The final argument glm::vec3(0,1,0)
is the “up” direction. We want the camera to be upright, so we set “up” to be the positive direction of the Y axis. If the camera was upside down, or tilted sideways, then this would be different.
After we’ve generated the camera matrix, we set the camera
shader variable with gProgram‑>setUniform("camera", camera);
. The setUniform
method is part of the tdogl::Program
class, and it calls glUniformMatrix4fv
to set the shader variable.
That’s it! We now have a functioning camera.
Unfortunately, if you run the program now you will see a completely black screen. This is because our camera matrix has transformed the vertices of the cube so that they are outside of the clip volume. As mentioned above, if it’s outside of the clip volume then it will not be visible. To make it visible again, we need to set use a projection matrix.
Remember that the clip volume is only two units wide, high, and deep. Let’s say one unit is equivalent to one meter in our 3D scene. That means we can only see two meters infront of the camera, which isn’t very convenient.
What we need to do is expand the clip volume so that more of the 3D scene can fit inside it. We can’t change the size of the clip volume, but we can shrink the entire scene so that more of it fits into the clip volume. This shrinking is a transformation, so we can represent it as a matrix. This is basically what projection matrices are used for.
Let’s update the vertex shader to add the projection matrix variable. The updated shader looks like this:
#version 150 uniform mat4 projection; //this is the new variable uniform mat4 camera; in vec3 vert; in vec2 vertTexCoord; out vec2 fragTexCoord; void main() { // Pass the tex coord straight through to the fragment shader fragTexCoord = vertTexCoord; // Apply camera and projection transformations to the vertex gl_Position = projection * camera * vec4(vert, 1); }
In matrix multiplication, the transformations are applied from right to left, or from closest to farthest from the vertex.
Notice the order of the matrix multiplication: projection * camera * vert
. The camera transformation is applied first, and the projection is applied second. In matrix multiplication, the transformations are applied from right to left, or from closest to farthest from the vertex.
Now let’s set the projection
shader variable from the C++ code, the same way we did for the camera
variable. Inside the LoadShaders
function, add the following code just above the camera matrix code:
glm::mat4 projection = glm::perspective<float>(50.0, SCREEN_SIZE.x/SCREEN_SIZE.y, 0.1, 10.0); gProgram->setUniform("projection", projection);
If you have used older versions of OpenGL then you may have used gluPerspective
to set the projection matrix, but gluPerspective
has been removed from recent versions of OpenGL. Fortunately glm::perspective
can be used as a replacement.
The first argument to glm::perspective
is the “field of view” argument. This is an angle, in degrees, that specifies how wide the camera’s vision is. A large field of view means the camera can see a lot of the scene, so the camera appears to be zoomed out. A small field of view means the camera can only see a small portion of the scene, so it appears to be zoomed in. The second argument is the “aspect” argument, which specifies the aspect ratio of the view. This is almost always set to width/height
of the window. The second last argument is the “near plane.” The near plane is the front of the clip volume, and the value 0.1
says that the near plane is 0.1 units away from the camera. Anything that is closer to the camera than 0.1
it will not be visible. The near plane must be a value greater than zero. The last argument is the “far plane”, which is the back of the clip volume. The value 10.0
says that the camera will display everything that is within 10 units of the camera. Anything further than 10 units away will not be visible. Our cube is about three units away, so it will be visible.
glm::perspective
effectively fits a viewing frustum inside the clip volume. A frustum is like a pyramid with the top cut off. The flat base of the pyramid is the far plane. The flat part where the top has been cut off is the near plane. The field of view is the skinniness or fatness of the frustum. Anything inside the frustum will be displayed on the screen, and anything outside will be hidden.
A viewing frustum. Image by Tetromino.
With the combination of the camera matrix and the projection matrix, we should be able to see the cube now. If you run the program at this point you will see this:
That looks… almost correct.
The cube looks square now, instead of rectangular. This is because of the “aspect” argument of the call to glm::perspective
, which corrects the aspect ratio based on the width and height of the window.
Unfortunately, the screenshot above shows that some of the back sides of the cube are being rendered over the top of the front sides. We obviously don’t want this to happen, so we will enable depth buffering to fix the problem.
Depth buffering is one way to stop things in the background from being drawn over the top of things in the foreground.
By default, OpenGL draws over the top of whatever was previously drawn. If the back side of an object is drawn after the front, then the back will be drawn over the top of the front. Depth buffering is one way to stop things in the background from being drawn over the top of things in the foreground.
A preexisting pixel only gets overwritten if the new pixel is closer to the camera.
With depth buffering enabled, every pixel that is drawn knows how far away from the camera it is. It stores this distance as a number in the “depth buffer.” When you draw a pixel over an existing pixel, OpenGL will look at the depth buffer to determine which pixel is closer to the camera. If the new pixel being drawn is closer to the camera, it will overwrite the existing pixel. If the preexisting pixel is closer to the camera, then the new pixel being drawn is discarded. That is, a preexisting pixel only gets overwritten if the new pixel is closer to the camera. This is called “depth testing.”
Enabling depth buffering is pretty easy in OpenGL. First we need to request a depth buffer when we create our window with glfwOpenWindow
. The new call to glfwOpenWindow
looks like this:
glfwOpenWindow((int)SCREEN_SIZE.x, (int)SCREEN_SIZE.y, 8, 8, 8, 8, 16, 0, GLFW_WINDOW)
The only difference is that the depth buffer argument is now 16
instead of 0
. This means that each pixel stores its distance to the camera as a 16 bit number. The more bits the depth buffer has, the more accurate it is.
Inside the AppMain
function, after the call to glewInit
, we add this code:
glEnable(GL_DEPTH_TEST); glDepthFunc(GL_LESS);
This tells OpenGL to enable depth testing. The glDepthFunc
call says that pixels should be overwritten if the distance to the camera is less than the existing pixel’s distance.
The last step is to clear the depth buffer every time we render a new frame. If we didn’t do this then the old pixel distances would stay in the buffer, which would ruin the rendering of the new frame. Inside the Render
function we change the glClear
call to this:
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
Finally, with depth buffering enabled, if you run the program the cube will be rendered correctly:
Congratulations if you’ve made it this far! The final thing to implement is the spinning animation of the cube.
How are we going to implement the spinning? You guessed it: another matrix. The difference between this matrix and the others is that this matrix will change every frame. The other matrices are constant.
I’m going to call this new matrix the “model” matrix. In a normal 3D engine, each object has a different model matrix. The camera and the projection matrices are the same for the whole scene, but the model matrices are different for each object in the scene. The model matrix is used to put each object in the correct position (translation), to make the object face the correct direction (rotation), and to change the size of the object (scaling). We only have one object in our 3D scene, though, so we only have one model matrix.
Let’s add a model
matrix variable to the vertex shader, like we did for the camera and projection. The final version of the vertex shader looks like this:
#version 150 uniform mat4 projection; uniform mat4 camera; uniform mat4 model; //this is the new variable in vec3 vert; in vec2 vertTexCoord; out vec2 fragTexCoord; void main() { // Pass the tex coord straight through to the fragment shader fragTexCoord = vertTexCoord; // Apply all matrix transformations to vert gl_Position = projection * camera * model * vec4(vert, 1); }
Notice the order of the matrix multiplication. The model matrix is the closest one to the vert
variable, which means the model matrix will be applied first, then the camera, and lastly the projection.
Now we need to set the new model
shader variable. Unlike the camera and projection variables, the model variable needs to be set every frame, so we will set it inside the Render
function. Just after the call to gProgram->use()
add this code:
gProgram->setUniform("model", glm::rotate(glm::mat4(), 45.0f, glm::vec3(0,1,0)));
We are using the glm::rotate
function to create a rotation transformation matrix. The first argument is an existing matrix to combine with the rotation matrix. We don’t have an existing matrix to combine, so we just pass in a new glm::mat4
object. The next argument is the angle of the rotation, or how much to rotate. Let’s just set this to a constant 45° for now. The last argument is the axis of the rotation. Imagine rotation as stabbing a skewer through an object, and then twisting the skewer. The skewer is the axis, and the angle is how much you twist it. In our case, we will use a vertical skewer, so the cube will rotate like it is sitting on a flat table.
If you run the program now, you will see that the cube has been rotated:
It isn’t spinning yet, because the matrix doesn’t change – it is always set to rotate 45°. The last step is the change the rotation angle a little bit every frame.
First, let’s add a new global called gDegreesRotated
.
GLfloat gDegreesRotated = 0.0f;
Every frame, we will slightly increase gDegreesRotated
, and then we will use it to calculate a new rotation matrix. This is how we will achieve animation. What we want to do is update, draw, update, draw, update, draw, in that pattern.
Let’s make a function called Update
that will slightly increase the gDegreesRotated
every time it is called:
void Update() { //rotate by 1 degree gDegreesRotated += 1.0f; //don't go over 360 degrees while(gDegreesRotated > 360.0f) gDegreesRotated -= 360.0f; }
Now we need to call the Update
function every time we draw a new frame. Let’s add it to the loop in AppMain
, just before the call to Render
.
while(glfwGetWindowParam(GLFW_OPENED)){ // update the rotation animation Update(); // draw one frame Render(); }
Now we just have to recalculate the model matrix based on the gDegreesRotated
variable. Inside the Render
function we’ll modify the code to set the model matrix:
gProgram->setUniform("model", glm::rotate(glm::mat4(), gDegreesRotated, glm::vec3(0,1,0)));
The only difference is that we are using the variable gDegreesRotated
instead of a constant 45°.
If you run the program now you should see a nice, smooth spinning animation of the cube. The only problem is that the spinning speed is tied to how many frames per second (FPS) we can render. If the FPS goes up, then the cube spins faster. If the FPS goes down, the cube spins more slowly. This isn’t ideal. A program should update correctly regardless of how many FPS it is running at.
To make the program run correctly, independent of FPS, the animation should be updated per second, not per frame. The simplest way to do this is to count the number of seconds elapsed since the last update, and update accordingly. Let’s change the Update
function so that it takes an argument called secondsElapsed
:
void Update(float secondsElapsed) { const GLfloat degreesPerSecond = 180.0f; gDegreesRotated += secondsElapsed * degreesPerSecond; while(gDegreesRotated > 360.0f) gDegreesRotated -= 360.0f; }
This will make the cube spin at a constant 180° per second, no matter what the frame rate is.
In the AppMain
loop, we need to calculate how many seconds have elapsed since the last update. The new loop looks like this:
double lastTime = glfwGetTime(); while(glfwGetWindowParam(GLFW_OPENED)){ // update the scene based on the time elapsed since last update double thisTime = glfwGetTime(); Update(thisTime - lastTime); lastTime = thisTime; // draw one frame Render(); }
glfwGetTime
returns the number of seconds elapsed since the program started running.
We use a variable called lastTime
to remember when the last update happened. Each iteration of the loop, we get the current time in the variable thisTime
. The time elapsed since the last update is thisTime - lastTime
. After the update is finished, we set lastTime = thisTime
so that the next iteration through the loop will work.
This is a simple way to update based on time. There is a better way to do time-based updates, but it is more complicated and we don’t need it yet.
In the next article we will be make the tdogl::Camera
class, which will allow us to have a first-person-shooter-style camera that can move around with the keyboard, look in different directions with the mouse, and zoom in and out with the mouse wheel.