Basic 3D
18 Oct 2024Introduction
In this post, we’ll explore the foundations of 3D graphics, focusing on vector math, matrices, and transformations. By the end, you’ll understand how objects are transformed in 3D space and projected onto the screen. We’ll use Haskell for the code examples, as it closely resembles the mathematical operations involved.
Vectors
A 4D vector has four components: \(x\), \(y\), \(z\), and \(w\).
In 3D graphics, we often work with 4D vectors (also called homogeneous coordinates) rather than 3D vectors. The extra dimension allows us to represent translations (which are not linear transformations) as matrix operations, keeping the math uniform.
A 4D vector is written as:
\[\boldsymbol{v} = \begin{bmatrix} x \\ y \\ z \\ w \end{bmatrix}\]Where:
- \(x, y, z\) represent the position in 3D space
- \(w\) is a homogeneous coordinate that allows us to apply translations and perspective transformations.
The extra \(w\)-component is crucial for distinguishing between points and directions (i.e., vectors). When \(w = 1\), the vector represents a point. When \(w = 0\), it represents a direction or vector.
Operations
We need to perform various operations on vectors in 3D space (or 4D homogeneous space), including addition, subtraction, multiplication, dot products, and normalization.
Addition
Given two vectors \(\boldsymbol{a}\) and \(\boldsymbol{b}\):
\[\boldsymbol{a} + \boldsymbol{b} = \begin{bmatrix} a_x \\ a_y \\ a_z \\ a_w \end{bmatrix} + \begin{bmatrix} b_x \\ b_y \\ b_z \\ b_w \end{bmatrix} = \begin{bmatrix} a_x + b_x \\ a_y + b_y \\ a_z + b_z \\ a_w + b_w \end{bmatrix}\]Subtraction
\[\boldsymbol{a} - \boldsymbol{b} = \begin{bmatrix} a_x \\ a_y \\ a_z \\ a_w \end{bmatrix} - \begin{bmatrix} b_x \\ b_y \\ b_z \\ b_w \end{bmatrix} = \begin{bmatrix} a_x - b_x \\ a_y - b_y \\ a_z - b_z \\ a_w - b_w \end{bmatrix}\]Dot Product
The dot product of two 3D vectors \(\boldsymbol{a} \cdot \boldsymbol{b}\) gives a scalar value:
\[\boldsymbol{a} \cdot \boldsymbol{b} = a_x \cdot b_x + a_y \cdot b_y + a_z \cdot b_z\]Cross Product
The cross product is a vector operation that takes two 3D vectors and returns a third vector that is orthogonal (perpendicular) to both of the input vectors. The cross product is commonly used in 3D graphics to calculate surface normals, among other things.
For two 3D vectors \(\boldsymbol{a}\) and \(\boldsymbol{b}\), the cross product \(\boldsymbol{a} \times \boldsymbol{b}\) is defined as:
\[\boldsymbol{a} \times \boldsymbol{b} = \begin{bmatrix} a_y \cdot b_z - a_z \cdot b_y \\ a_z \cdot b_x - a_x \cdot b_z \\ a_x \cdot b_y - a_y \cdot b_x \end{bmatrix}\]This resulting vector is perpendicular to both \(\boldsymbol{a}\) and \(\boldsymbol{b}\).
To implement the cross product in Haskell, we will only operate on the \(x\), \(y\), and \(z\) components of a
Vec4
(ignoring \(w\)) since the cross product is defined for 3D vectors.
Length
The length or magnitude of a vector \(\boldsymbol{v}\) is:
\[\lVert \boldsymbol{v} \rVert = \sqrt{x^2 + y^2 + z^2}\]Normalization
To normalize a vector is to scale it so that its length is 1:
\[\boldsymbol{v}_{\text{norm}} = \frac{\boldsymbol{v}}{\lVert \boldsymbol{v} \rVert}\]Matrices
A 4x4 matrix consists of 16 elements. We’ll represent it as a flat structure with 16 values:
\[M = \begin{bmatrix} m_{00} & m_{01} & m_{02} & m_{03} \\ m_{10} & m_{11} & m_{12} & m_{13} \\ m_{20} & m_{21} & m_{22} & m_{23} \\ m_{30} & m_{31} & m_{32} & m_{33} \end{bmatrix}\]In Haskell, we define it as:
In 3D graphics, transformations are applied to objects using 4x4 matrices. These matrices allow us to perform operations like translation, scaling, and rotation.
Operations
Addition
Adding two matrices \(A\) and \(B\) is done element-wise:
\[A + B = \begin{bmatrix} a_{00} + b_{00} & a_{01} + b_{01} & \dots \\ a_{10} + b_{10} & \dots & \dots \end{bmatrix}\]Multiplication
Multiplying two matrices \(A\) and \(B\):
\[C = A \cdot B\]Where each element \(c_{ij}\) of the resulting matrix is calculated as:
\[c_{ij} = a_{i0} \cdot b_{0j} + a_{i1} \cdot b_{1j} + a_{i2} \cdot b_{2j} + a_{i3} \cdot b_{3j}\]Vector Multiply
We transform a vector by a matrix using a multiply operation.
\[\boldsymbol{v'} = M \cdot \boldsymbol{v}\]3D Transformations
In 3D graphics, we apply transformations like translation, scaling, and rotation using matrices. These transformations are applied to 4D vectors, and the operations are represented as matrix multiplications.
Identity Matrix
The identity matrix is a 4x4 matrix that leaves a vector unchanged when multiplied:
\[I = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]Translation Matrix
To translate a point by \((t_x, t_y, t_z)\), we use the translation matrix:
\[T = \begin{bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}\]Scale Matrix
Scaling a vector by \(s_x, s_y, s_z\) is done using the following matrix:
\[S = \begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]Rotation Matrix
In 3D graphics, we frequently need to rotate objects around the X, Y, and Z axes. Each axis has its own corresponding rotation matrix, which we use to apply the rotation transformation to points in 3D space.
A rotation around the X-axis by an angle \(\theta\) is represented by the following matrix:
\[R_x(\theta) = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \theta & -\sin \theta & 0 \\ 0 & \sin \theta & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]A rotation around the Y-axis by an angle \(\theta\) is represented by the following matrix:
\[R_y(\theta) = \begin{bmatrix} \cos \theta & 0 & \sin \theta & 0 \\ 0 & 1 & 0 & 0 \\ -\sin \theta & 0 & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]A rotation around the Z-axis by an angle \(\theta\) is represented by the following matrix:
\[R_z(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta & 0 & 0 \\ \sin \theta & \cos \theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]Combining Rotation Matrices
To rotate an object in 3D space about multiple axes, we can multiply the individual rotation matrices. The order of multiplication is crucial since matrix multiplication is not commutative. Typically, we perform rotations in the order of Z, then Y, then X (if required).
\[R = R_x(\theta_x) \cdot R_y(\theta_y) \cdot R_z(\theta_z)\]Rotation Matrices in Haskell
Let’s implement the rotation matrices for the X, Y, and Z axes in Haskell:
Example: Rotating an Object
To apply a rotation to an object, you can combine the rotation matrices and multiply them by the object’s position vector. For instance, to rotate a point by \(\theta_x\), \(\theta_y\), and \(\theta_z\), you can multiply the corresponding matrices:
3D Transformations and Projection
Local vs World Coordinates
When dealing with 3D objects, we distinguish between local coordinates (relative to an object) and world coordinates (relative to the entire scene). Vectors are transformed from local to world coordinates by multiplying them by transformation matrices.
Projection Calculation
To project a 3D point onto a 2D screen, we use a projection matrix. The projection matrix transforms 3D coordinates into 2D coordinates by applying a perspective transformation.
A simple perspective projection matrix looks like this:
\[P = \begin{bmatrix} \frac{1}{\text{aspect}} & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{z_f + z_n}{z_n - z_f} & \frac{2z_f z_n}{z_n - z_f} \\ 0 & 0 & -1 & 0 \end{bmatrix}\]Where:
- \(z_f\) is the far clipping plane
- \(z_n\) is the near clipping plane
- \(\text{aspect}\) is the aspect ratio of the screen
Reducing a 4D Vector to 2D Screen Coordinates
In 3D graphics, we often work with 4D vectors in homogeneous coordinates. To display a 3D point on a 2D screen, we need to project that point using a projection matrix and then convert the resulting 4D vector into 2D coordinates that we can draw on the screen.
Here’s how this process works:
Step 1: Apply the Projection Matrix
We start with a 4D vector \(\boldsymbol{v}\) in homogeneous coordinates:
\[\boldsymbol{v} = \begin{bmatrix} x \\ y \\ z \\ w \end{bmatrix}\]We apply the projection matrix \(P\), which transforms the 4D point into clip space (a space where coordinates can be projected to the screen).
The projection matrix looks something like this for perspective projection:
\[P = \begin{bmatrix} \frac{1}{\text{aspect}} & 0 & 0 & 0 \\ 0 & \frac{1}{\tan(\frac{fov}{2})} & 0 & 0 \\ 0 & 0 & \frac{z_f + z_n}{z_n - z_f} & \frac{2z_f z_n}{z_n - z_f} \\ 0 & 0 & -1 & 0 \end{bmatrix}\]Multiplying \(\boldsymbol{v}\) by \(P\) gives us:
\[\boldsymbol{v'} = P \cdot \boldsymbol{v} = \begin{bmatrix} x' \\ y' \\ z' \\ w' \end{bmatrix}\]Where:
\[\boldsymbol{v'} = \begin{bmatrix} x' \\ y' \\ z' \\ w' \end{bmatrix} = P \cdot \begin{bmatrix} x \\ y \\ z \\ w \end{bmatrix}\]Step 2: Perspective Divide
To convert the 4D vector \(\boldsymbol{v'}\) to 3D space, we perform the perspective divide. This means dividing the \(x'\), \(y'\), and \(z'\) components by the \(w'\) component.
The resulting 3D point \(\boldsymbol{v_{3D}}\) is:
\[\boldsymbol{v_{3D}} = \begin{bmatrix} \frac{x'}{w'} \\ \frac{y'}{w'} \\ \frac{z'}{w'} \end{bmatrix}\]Step 3: Convert to Screen Coordinates
To get the final 2D screen coordinates, we need to convert the 3D point into normalized device coordinates (NDC), which range from -1 to 1. The screen coordinates ( (x_{\text{screen}}, y_{\text{screen}}) ) are then obtained by scaling these values to the screen dimensions:
\[x_{\text{screen}} = \left( \frac{x_{3D} + 1}{2} \right) \cdot \text{width}\] \[y_{\text{screen}} = \left( \frac{1 - y_{3D}}{2} \right) \cdot \text{height}\]The factor \(\frac{x_{3D} + 1}{2}\) maps the normalized \(x\)-coordinate from the range [-1, 1] to [0, 1], and multiplying by the screen width gives us the pixel position. The same applies for \(y_{\text{screen}}\), but we invert the \(y_{3D}\) coordinate to account for the fact that screen coordinates typically have the origin at the top-left corner, whereas the NDC system has the origin at the center.
Putting it All Together in Haskell
Here’s how you can perform this transformation in Haskell:
Example
Suppose we have the following vector and projection matrix:
This will give you the screen coordinates \(x_{\text{screen}}\) and \(y_{\text{screen}}\), where the 3D point \((1, 1, 1)\) will be projected on a 1920x1080 display.
Conclusion
This has been some of the basic 3D concepts presented through Haskell. In future posts, we’ll use this code to create some basic animations on screen.