Cogs and Levers A blog full of technical stuff

Computers Understanding Text

Introduction

When working with Natural Language Processing (NLP), one of the first challenges you encounter is how to convert human-readable text into a format that machines can understand. Computers don’t natively understand words or sentences; they operate in numbers.

So, how do we get from words to something a machine can process?

This is where text preprocessing comes in.

Text preprocessing involves several steps to prepare raw text for analysis. In this post, we’ll walk through the foundational techniques in preprocessing: tokenization, lowercasing, removing stop words, and stemming/lemmatization. These steps ensure that our text is in a clean, structured format for further processing like word embeddings or more complex NLP models.

Tokenization: Breaking Down the Text

What is Tokenization?

Tokenization is the process of breaking a string of text into smaller pieces, usually words or subwords. In essence, it’s the process of splitting sentences into tokens, which are the basic units for further NLP tasks.

For example, consider the sentence:

"I love NLP!"

Tokenization would break this into:

["I", "love", "NLP", "!"]

This is a simple example where each token corresponds to a word or punctuation. However, tokenization can get more complex depending on the language and the task. For instance, some tokenizers split contractions like “can’t” into ["can", "'t"], while others might treat it as one token. Tokenization also becomes more challenging in languages that don’t have spaces between words, like Chinese or Japanese.

Code Example: Tokenization in Python

Let’s look at a basic example of tokenization using Python’s nltk library:

import nltk
nltk.download('punkt_tab')
from nltk.tokenize import word_tokenize

sentence = "I love NLP!"
tokens = word_tokenize(sentence)
print(tokens)

The output you can see is simply:

['I', 'love', 'NLP', '!']

Code Example: Sentence Tokenization

Tokenization can also occur at the sentence level, which means breaking down a paragraph or a larger body of text into individual sentences. This is helpful for tasks like summarization or sentiment analysis, where sentence boundaries matter.

from nltk.tokenize import sent_tokenize

text = "NLP is fun. It's amazing how machines can understand text!"
sentences = sent_tokenize(text)
print(sentences)

The output is now on the sentence boundary:

['NLP is fun.', "It's amazing how machines can understand text!"]

Lowercasing: Making Text Uniform

In English, the words “Dog” and “dog” mean the same thing, but to a computer, they are two different tokens. Lowercasing is a simple yet powerful step in text preprocessing. By converting everything to lowercase, we reduce the complexity of the text and ensure that words like “NLP” and “nlp” are treated identically.

We can achieve this simple with the .lower() method off of a string.

This step becomes crucial when dealing with large text corpora, as it avoids treating different capitalizations of the same word as distinct entities.

Removing Stop Words: Filtering Out Common Words

Stop words are commonly used words that don’t carry significant meaning in many tasks, such as “and”, “the”, and “is”. Removing stop words helps reduce noise in the data and improves the efficiency of downstream models by focusing only on the meaningful parts of the text.

Many libraries provide lists of stop words, but the ideal list can vary depending on the task.

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))
tokens = ["I", "love", "NLP", "!"]
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
print(filtered_tokens)

Only the value words remain, removing the “I”:

["love", "NLP", "!"]

Stemming and Lemmatization: Reducing Words to Their Root Forms

Another key step in preprocessing is reducing words to their base or root form. There are two common approaches:

Stemming: This cuts off word endings to get to the base form, which can sometimes be rough. For example, “running”, “runner”, and “ran” might all be reduced to “run”.

Lemmatization: This is a more refined process that looks at the word’s context and reduces it to its dictionary form. For instance, “better” would be lemmatized to “good”.

Here’s an example using nltk for both stemming and lemmatization:

from nltk.stem import PorterStemmer

ps = PorterStemmer()
words = ["running", "runner", "ran"]
stemmed_words = [ps.stem(word) for word in words]
print(stemmed_words)

The output here:

['run', 'runner', 'ran']

An example of Lemmatization looks like this:

from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()
words = ["running", "better", "ran"]
lemmatized_words = [lemmatizer.lemmatize(word, pos="v") for word in words]
print(lemmatized_words)
['run', 'good', 'run']

Conclusion

Text preprocessing is a crucial first step in any NLP project. By breaking down text through tokenization, making it uniform with lowercasing, and reducing unnecessary noise with stop word removal and stemming/lemmatization, we can create a clean and structured dataset for further analysis or model training. These steps form the foundation upon which more advanced techniques, such as word embeddings and machine learning models, are built.

Word Embeddings

Introduction

Word embeddings are one of the most significant advancements in natural language processing (NLP). They allow us to transform words or sentences into vectors, where each word is represented by a point in a high-dimensional space. The core idea is that words with similar meanings are close to each other in this space, making it possible to use mathematical operations on these vectors to uncover relationships between words.

In this post, we’ll explore how to create word embeddings using a pre-trained model, and we’ll perform various vector operations to see how these embeddings capture semantic relationships. We’ll cover examples like analogy generation, word similarity, and how these embeddings can be leveraged for search tasks.

What Are Word Embeddings?

Word embeddings are dense vector representations of words, where each word is mapped to a point in a continuous vector space. Unlike older techniques (such as one-hot encoding) that give each word a unique identifier, embeddings represent words in a way that captures semantic relationships, such as similarity and analogy.

For example, embeddings can represent the relationship:

king - man + woman = queen

This is made possible because words that are semantically similar (e.g., “king” and “queen”) have vector representations that are close together in space, while words that are opposites (e.g., “good” and “bad”) may have vectors pointing in opposite directions.

Gensim

Let’s begin by loading a pre-trained word embedding model. We’ll use the glove-wiki-gigaword-50 model, which provides 50-dimensional vectors for many common words.

import gensim.downloader as api

# Load the pre-trained GloVe Word2Vec model
model = api.load("glove-wiki-gigaword-50")

This might take a moment to download. It’s not too big.

Now that we have the model, let’s try converting some words into vectors.

Converting Words to Vectors

We can take individual words and get their vector representations. Let’s look at the vectors for “king,” “queen,” “man,” and “woman.”

# Example words
word1 = "king"
word2 = "queen"
word3 = "man"
word4 = "woman"

# Get the vectors for each word
vector_king = model[word1]
vector_queen = model[word2]
vector_man = model[word3]
vector_woman = model[word4]

# Print the vector for 'king'
print(f"Vector for '{word1}':\n{vector_king}")

You’ll see that each word is represented as a 50-dimensional vector. These vectors capture the meanings of the words in such a way that we can manipulate them mathematically.

Vector for 'king':
[ 0.50451   0.68607  -0.59517  -0.022801  0.60046  -0.13498  -0.08813
0.47377  -0.61798  -0.31012  -0.076666  1.493    -0.034189 -0.98173
0.68229   0.81722  -0.51874  -0.31503  -0.55809   0.66421   0.1961
-0.13495  -0.11476  -0.30344   0.41177  -2.223    -1.0756   -1.0783
-0.34354   0.33505   1.9927   -0.04234  -0.64319   0.71125   0.49159
0.16754   0.34344  -0.25663  -0.8523    0.1661    0.40102   1.1685
-1.0137   -0.21585  -0.15155   0.78321  -0.91241  -1.6106   -0.64426
-0.51042 ]

Performing Vector Arithmetic

One of the most famous examples of vector arithmetic in word embeddings is the analogy:

king - man + woman = queen

We can perform this operation by subtracting the vector for “man” from “king” and then adding the vector for “woman.” Let’s try this and see what word is closest to the resulting vector.

# Perform vector arithmetic
result_vector = vector_king - vector_man + vector_woman

# Find the closest word to the resulting vector
similar_words = model.similar_by_vector(result_vector, topn=3)

# Print the result
print("Result of 'king - man + woman':", similar_words)

You should find that the word closest to the resulting vector is “queen,” demonstrating that the model captures the gender relationship between “king” and “queen.”

Measuring Word Similarity with Cosine Similarity

Another key operation you can perform on word embeddings is measuring the similarity between two words. The most common way to do this is by calculating the cosine similarity between the two vectors. The cosine similarity between two vectors is defined as:

\[\text{cosine similarity} = \frac{A \cdot B}{\|A\| \|B\|}\]

This returns a value between -1 and 1:

  • 1 means the vectors are identical (the words are very similar),
  • 0 means the vectors are orthogonal (unrelated words),
  • -1 means the vectors are pointing in opposite directions (possibly antonyms).

Let’s measure the similarity between related words like “apple” and “fruit,” and compare it to unrelated words like “apple” and “car.”

import numpy as np
from numpy.linalg import norm

# Function to calculate cosine similarity
def cosine_similarity(vec1, vec2):
return np.dot(vec1, vec2) / (norm(vec1) * norm(vec2))

# Get vectors for 'apple', 'fruit', and 'car'
vector_apple = model['apple']
vector_fruit = model['fruit']
vector_car = model['car']

# Calculate cosine similarity
similarity_apple_fruit = cosine_similarity(vector_apple, vector_fruit)
similarity_apple_car = cosine_similarity(vector_apple, vector_car)

print(f"Cosine Similarity between 'apple' and 'fruit': {similarity_apple_fruit:.4f}")
print(f"Cosine Similarity between 'apple' and 'car': {similarity_apple_car:.4f}")

You will see that the cosine similarity between “apple” and “fruit” is much higher than that between “apple” and “car,” illustrating the semantic relationship between “apple” and “fruit.”

Cosine Similarity between 'apple' and 'fruit': 0.5918
Cosine Similarity between 'apple' and 'car': 0.3952

Search Using Word Embeddings

Another powerful use of word embeddings is in search tasks. If you want to find words that are most similar to a given word, you can use the model’s similar_by_word function to retrieve the top N most similar words. Here’s how you can search for words most similar to “apple”:

# Find words most similar to 'apple'
similar_words_to_apple = model.similar_by_word('apple', topn=5)
print("Words most similar to 'apple':", similar_words_to_apple)

You can see here that “apple” is treated in the proper noun sense as in the company Apple.

Words most similar to 'apple': [
('blackberry', 0.7543067336082458), 
('chips', 0.7438644170761108), 
('iphone', 0.7429664134979248), 
('microsoft', 0.7334205508232117), 
('ipad', 0.7331036329269409)
]

Each of these words has strong relevance to the company.

Averaging Word Vectors

Another interesting operation is averaging word vectors. This allows us to combine the meaning of two words into a single vector. For instance, we could average the vectors for “apple” and “orange” to get a vector that represents something like “fruit.”

# Average of 'apple' and 'orange'
vector_fruit_avg = (model['apple'] + model['orange']) / 2

# Find the words closest to the average vector
similar_to_fruit_avg = model.similar_by_vector(vector_fruit_avg, topn=5)
print("Words similar to the average of 'apple' and 'orange':", similar_to_fruit_avg)

There are a number of related words to both “apple” and “orange”. The average provides us with this intersection.

Words similar to the average of 'apple' and 'orange': [
('apple', 0.8868993520736694), 
('orange', 0.8670367002487183), 
('juice', 0.7459520101547241), 
('cherry', 0.7071465849876404), 
('cream', 0.7013142704963684)
]

Conclusion

Word embeddings are a powerful way to represent the meaning of words as vectors in a high-dimensional space. By using simple mathematical operations, such as vector arithmetic and cosine similarity, we can uncover a variety of semantic relationships between words. These operations allow embeddings to be used in tasks such as analogy generation, search, and clustering.

In this post, we explored how to use pre-trained word embeddings, perform vector operations, and leverage them for real-world tasks. These foundational concepts are what power much of the magic behind modern NLP techniques, from search engines to chatbots and more.

Straight lines

Introduction

In mathematics, the straight line equation \(y = mx + c\) is one of the simplest yet most foundational equations in both algebra and geometry. It defines a linear relationship between two variables, \(x\) and \(y\), where \(m\) represents the slope (or gradient) of the line, and \(c\) is the y-intercept, the point where the line crosses the y-axis.

This article explores key concepts related to the straight line equation, interesting properties, and how we can use Haskell to implement some useful functions.

Understanding the Equation

The equation \(y = mx + c\) allows us to describe a straight line in a two-dimensional plane. Here’s a breakdown of its components:

  • \(m\): The slope, which measures how steep the line is. It’s defined as the change in \(y\) divided by the change in \(x\) , or \(\frac{\Delta y}{\Delta x}\).
  • \(c\): The y-intercept, which is the value of \(y\) when \(x = 0\).

One of the key properties of this equation is that for every unit increase in \(x\), the value of \(y\) increases by \(m\). We can illustrate this behavior using some Haskell code.

Basic Line Function in Haskell

Let’s implement the basic straight line function in Haskell. This function will take \(m\), \(c\), and \(x\) as inputs and return the corresponding \(y\) value.

lineFunction :: Float -> Float -> Float -> Float
lineFunction m c x = m * x + c

This function calculates \(y\) for any given \(x\) using the slope \(m\) and y-intercept \(c\).

Parallel and Perpendicular Lines

An interesting aspect of lines is how they relate to each other. If two lines are parallel, they have the same slope. If two lines are perpendicular, the slope of one is the negative reciprocal of the other. In mathematical terms, if one line has a slope \(m_1\), the perpendicular line has a slope of \(-\frac{1}{m_1}\).

We can express this relationship in Haskell using a function to check if two lines are perpendicular.

arePerpendicular :: Float -> Float -> Bool
arePerpendicular m1 m2 = m1 * m2 == -1

This function takes two slopes and returns True if they are perpendicular and False otherwise.

Finding the Intersection of Two Lines

To find the point where two lines intersect, we need to solve the system of equations:

\(y = m_1x + c_1\) \(y = m_2x + c_2\)

By setting the equations equal to each other, we can solve for \(x\) and then substitute the result into one of the equations to find \(y\). The formula for the intersection point is:

\[x = \frac{c_2 - c_1}{m_1 - m_2}\]

Here’s a Haskell function that calculates the intersection point of two lines:

intersection :: Float -> Float -> Float -> Float -> Maybe (Float, Float)
intersection m1 c1 m2 c2
    | m1 == m2  = Nothing -- Parallel lines never intersect
    | otherwise = let x = (c2 - c1) / (m1 - m2)
    y = m1 * x + c1
  in Just (x, y)

This function returns Nothing if the lines are parallel and Just (x, y) if the lines intersect.

Conclusion

The straight line equation \(y = mx + c\) is a simple but powerful tool in both mathematics and programming. We’ve explored how to implement the line equation in Haskell, find parallel and perpendicular lines, and calculate intersection points. Understanding these properties gives you a deeper appreciation of how linear relationships work, both in theory and in practice.

By writing these functions in Haskell, you can model and manipulate straight lines in code, extending these basic principles to more complex problems.

Quadratic equations

Introduction

The quadratic equation is one of the fundamental concepts in algebra and forms the basis of many more complex topics in mathematics and computer science. It has the general form:

\[ax^2 + bx + c = 0\]

where \(a\), \(b\), and \(c\) are constants, and \(x\) represents the unknown variable.

In this post, we’ll explore:

  • What the quadratic equation represents
  • How to solve it using the quadratic formula
  • How to implement this solution in Haskell

What Is a Quadratic Equation?

A quadratic equation is a second-degree polynomial equation. This means the highest exponent of the variable \(x\) is 2.

Quadratic equations typically describe parabolas when plotted on a graph.

The Quadratic Formula

The quadratic formula provides a method to find the values of \(x\) that satisfy the equation. The formula is:

\[x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}\]

Here, the expression \(b^2 - 4ac\) is called the discriminant, and it plays a key role in determining the nature of the solutions:

  • If the discriminant is positive, the equation has two real and distinct roots.
  • If the discriminant is zero, the equation has one real (repeated) root.
  • If the discriminant is negative, the equation has two complex roots.

Step-by-Step Solution

  1. Calculate the Discriminant: The discriminant, \(\Delta\), is given by: \(\Delta = b^2 - 4ac\)

  2. Evaluate the Roots: Using the discriminant, you can find the roots by plugging the values into the quadratic formula: \(x_1 = \frac{-b + \sqrt{\Delta}}{2a}, \quad x_2 = \frac{-b - \sqrt{\Delta}}{2a}\)

If \(\Delta < 0\), the square root term involves imaginary numbers, leading to complex solutions.

Haskell Implementation

Now let’s translate this mathematical solution into a Haskell function. Haskell is a functional programming language with a strong emphasis on immutability and mathematical precision, making it an excellent choice for implementing mathematical algorithms.

Below, we’ll create a function quadraticSolver that:

  • Takes the coefficients \(a\), \(b\), and \(c\) as inputs.
  • Computes the discriminant.
  • Determines the nature of the roots based on the discriminant.
  • Returns the roots of the quadratic equation.
-- Haskell implementation of solving a quadratic equation
import Text.Printf (printf)

-- Function to solve the quadratic equation
quadraticSolver :: (RealFloat a, Show a) => a -> a -> a -> String
quadraticSolver a b c
    | discriminant > 0 = printf "Two real roots: x1 = %.2f, x2 = %.2f" x1 x2
    | discriminant == 0 = printf "One real root: x = %.2f" x1
    | otherwise = printf "Two complex roots: x1 = %.2f + %.2fi, x2 = %.2f - %.2fi" realPart imaginaryPart realPart imaginaryPart
  where
    discriminant = b^2 - 4 * a * c
    x1 = (-b + sqrt discriminant) / (2 * a)
    x2 = (-b - sqrt discriminant) / (2 * a)
    realPart = -b / (2 * a)
    imaginaryPart = sqrt (abs discriminant) / (2 * a)

-- Example usage
main :: IO ()
main = do
    putStrLn "Enter coefficients a, b, and c:"
    a <- readLn
    b <- readLn
    c <- readLn
    putStrLn $ quadraticSolver a b c

Code Breakdown:

  1. Imports: We import the Text.Printf module to format the output to two decimal places.

  2. quadraticSolver Function:
    • This function takes three arguments: \(a\), \(b\), and \(c\).
    • It computes the discriminant using the formula \(\Delta = b^2 - 4ac\).
    • It checks the value of the discriminant using Haskell’s guards (|), and based on its value, it computes the roots.
    • If the discriminant is negative, we compute the real and imaginary parts separately and display the complex roots in the form \(x = a + bi\).
  3. main Function:
    • The main function prompts the user to input the coefficients \(a\), \(b\), and \(c\).
    • It then calls quadraticSolver to compute and display the roots.

Example Run

Let’s assume we are solving the equation \(x^2 - 3x + 2 = 0\), where \(a = 1\), \(b = -3\), and \(c = 2\).

Enter coefficients a, b, and c:
1
-3
2
Two real roots: x1 = 2.00, x2 = 1.00

If we try solving the equation \(x^2 + 2x + 5 = 0\), where \(a = 1\), \(b = 2\), and \(c = 5\).

Enter coefficients a, b, and c:
1
2
5
Two complex roots: x1 = -1.00 + 2.00i, x2 = -1.00 - 2.00i

Conclusion

The quadratic equation is a simple but powerful mathematical tool. In this post, we derived the quadratic formula, discussed how the discriminant affects the solutions, and implemented it in Haskell. The solution handles both real and complex roots elegantly, thanks to Haskell’s functional paradigm.

Basic 3D

Introduction

In this post, we’ll explore the foundations of 3D graphics, focusing on vector math, matrices, and transformations. By the end, you’ll understand how objects are transformed in 3D space and projected onto the screen. We’ll use Haskell for the code examples, as it closely resembles the mathematical operations involved.

Vectors

A 4D vector has four components: \(x\), \(y\), \(z\), and \(w\).

data Vec4 = Vec4 { x :: Double, y :: Double, z :: Double, w :: Double }
    deriving (Show, Eq)

In 3D graphics, we often work with 4D vectors (also called homogeneous coordinates) rather than 3D vectors. The extra dimension allows us to represent translations (which are not linear transformations) as matrix operations, keeping the math uniform.

A 4D vector is written as:

\[\boldsymbol{v} = \begin{bmatrix} x \\ y \\ z \\ w \end{bmatrix}\]

Where:

  • \(x, y, z\) represent the position in 3D space
  • \(w\) is a homogeneous coordinate that allows us to apply translations and perspective transformations.

The extra \(w\)-component is crucial for distinguishing between points and directions (i.e., vectors). When \(w = 1\), the vector represents a point. When \(w = 0\), it represents a direction or vector.

Operations

We need to perform various operations on vectors in 3D space (or 4D homogeneous space), including addition, subtraction, multiplication, dot products, and normalization.

Addition

Given two vectors \(\boldsymbol{a}\) and \(\boldsymbol{b}\):

\[\boldsymbol{a} + \boldsymbol{b} = \begin{bmatrix} a_x \\ a_y \\ a_z \\ a_w \end{bmatrix} + \begin{bmatrix} b_x \\ b_y \\ b_z \\ b_w \end{bmatrix} = \begin{bmatrix} a_x + b_x \\ a_y + b_y \\ a_z + b_z \\ a_w + b_w \end{bmatrix}\]
add :: Vec4 -> Vec4 -> Vec4
add (Vec4 ax ay az aw) (Vec4 bx by bz bw) = Vec4 (ax + bx) (ay + by) (az + bz) (aw + bw)

Subtraction

\[\boldsymbol{a} - \boldsymbol{b} = \begin{bmatrix} a_x \\ a_y \\ a_z \\ a_w \end{bmatrix} - \begin{bmatrix} b_x \\ b_y \\ b_z \\ b_w \end{bmatrix} = \begin{bmatrix} a_x - b_x \\ a_y - b_y \\ a_z - b_z \\ a_w - b_w \end{bmatrix}\]
sub :: Vec4 -> Vec4 -> Vec4
sub (Vec4 ax ay az aw) (Vec4 bx by bz bw) = Vec4 (ax - bx) (ay - by) (az - bz) (aw - bw)

Dot Product

The dot product of two 3D vectors \(\boldsymbol{a} \cdot \boldsymbol{b}\) gives a scalar value:

\[\boldsymbol{a} \cdot \boldsymbol{b} = a_x \cdot b_x + a_y \cdot b_y + a_z \cdot b_z\]
dot :: Vec4 -> Vec4 -> Double
dot (Vec4 ax ay az _) (Vec4 bx by bz _) = ax * bx + ay * by + az * bz

Cross Product

The cross product is a vector operation that takes two 3D vectors and returns a third vector that is orthogonal (perpendicular) to both of the input vectors. The cross product is commonly used in 3D graphics to calculate surface normals, among other things.

For two 3D vectors \(\boldsymbol{a}\) and \(\boldsymbol{b}\), the cross product \(\boldsymbol{a} \times \boldsymbol{b}\) is defined as:

\[\boldsymbol{a} \times \boldsymbol{b} = \begin{bmatrix} a_y \cdot b_z - a_z \cdot b_y \\ a_z \cdot b_x - a_x \cdot b_z \\ a_x \cdot b_y - a_y \cdot b_x \end{bmatrix}\]

This resulting vector is perpendicular to both \(\boldsymbol{a}\) and \(\boldsymbol{b}\).

To implement the cross product in Haskell, we will only operate on the \(x\), \(y\), and \(z\) components of a Vec4 (ignoring \(w\)) since the cross product is defined for 3D vectors.

-- Compute the cross product of two 3D vectors
cross :: Vec4 -> Vec4 -> Vec4
cross (Vec4 ax ay az _) (Vec4 bx by bz _) =
    Vec4 ((ay * bz) - (az * by))  -- x component
         ((az * bx) - (ax * bz))  -- y component
         ((ax * by) - (ay * bx))  -- z component
         0                        -- w is zero for a direction vector

Length

The length or magnitude of a vector \(\boldsymbol{v}\) is:

\[\lVert \boldsymbol{v} \rVert = \sqrt{x^2 + y^2 + z^2}\]
len :: Vec4 -> Double
len (Vec4 x y z _) = sqrt (x * x + y * y + z * z)

Normalization

To normalize a vector is to scale it so that its length is 1:

\[\boldsymbol{v}_{\text{norm}} = \frac{\boldsymbol{v}}{\lVert \boldsymbol{v} \rVert}\]
normalize :: Vec4 -> Vec4
normalize v = let l = len v in scale (1 / l) v

Matrices

A 4x4 matrix consists of 16 elements. We’ll represent it as a flat structure with 16 values:

\[M = \begin{bmatrix} m_{00} & m_{01} & m_{02} & m_{03} \\ m_{10} & m_{11} & m_{12} & m_{13} \\ m_{20} & m_{21} & m_{22} & m_{23} \\ m_{30} & m_{31} & m_{32} & m_{33} \end{bmatrix}\]

In Haskell, we define it as:

data Mat4 = Mat4 { m00 :: Double, m01 :: Double, m02 :: Double, m03 :: Double
                 , m10 :: Double, m11 :: Double, m12 :: Double, m13 :: Double
                 , m20 :: Double, m21 :: Double, m22 :: Double, m23 :: Double
                 , m30 :: Double, m31 :: Double, m32 :: Double, m33 :: Double
}
deriving (Show, Eq)

In 3D graphics, transformations are applied to objects using 4x4 matrices. These matrices allow us to perform operations like translation, scaling, and rotation.

Operations

Addition

Adding two matrices \(A\) and \(B\) is done element-wise:

\[A + B = \begin{bmatrix} a_{00} + b_{00} & a_{01} + b_{01} & \dots \\ a_{10} + b_{10} & \dots & \dots \end{bmatrix}\]
addM :: Mat4 -> Mat4 -> Mat4
addM (Mat4 a00 a01 a02 a03 a10 a11 a12 a13 a20 a21 a22 a23 a30 a31 a32 a33)
(Mat4 b00 b01 b02 b03 b10 b11 b12 b13 b20 b21 b22 b23 b30 b31 b32 b33) =
    Mat4 (a00 + b00) (a01 + b01) (a02 + b02) (a03 + b03)
         (a10 + b10) (a11 + b11) (a12 + b12) (a13 + b13)
         (a20 + b20) (a21 + b21) (a22 + b22) (a23 + b23)
         (a30 + b30) (a31 + b31) (a32 + b32) (a33 + b33)

Multiplication

Multiplying two matrices \(A\) and \(B\):

\[C = A \cdot B\]

Where each element \(c_{ij}\) of the resulting matrix is calculated as:

\[c_{ij} = a_{i0} \cdot b_{0j} + a_{i1} \cdot b_{1j} + a_{i2} \cdot b_{2j} + a_{i3} \cdot b_{3j}\]
mulM :: Mat4 -> Mat4 -> Mat4
mulM (Mat4 a00 a01 a02 a03 a10 a11 a12 a13 a20 a21 a22 a23 a30 a31 a32 a33)
(Mat4 b00 b01 b02 b03 b10 b11 b12 b13 b20 b21 b22 b23 b30 b31 b32 b33) =
    Mat4 (a00 * b00 + a01 * b10 + a02 * b20 + a03 * b30)
         (a00 * b01 + a01 * b11 + a02 * b21 + a03 * b31)
         (a00 * b02 + a01 * b12 + a02 * b22 + a03 * b32)
         (a00 * b03 + a01 * b13 + a02 * b23 + a03 * b33)
         (a10 * b00 + a11 * b10 + a12 * b20 + a13 * b30)
         (a10 * b01 + a11 * b11 + a12 * b21 + a13 * b31)
         (a10 * b02 + a11 * b12 + a12 * b22 + a13 * b32)
         (a10 * b03 + a11 * b13 + a12 * b23 + a13 * b33)
         (a20 * b00 + a21 * b10 + a22 * b20 + a23 * b30)
         (a20 * b01 + a21 * b11 + a22 * b21 + a23 * b31)
         (a20 * b02 + a21 * b12 + a22 * b22 + a23 * b32)
         (a20 * b03 + a21 * b13 + a22 * b23 + a23 * b33)
         (a30 * b00 + a31 * b10 + a32 * b20 + a33 * b30)
         (a30 * b01 + a31 * b11 + a32 * b21 + a33 * b31)
         (a30 * b02 + a31 * b12 + a32 * b22 + a33 * b32)
         (a30 * b03 + a31 * b13 + a32 * b23 + a33 * b33)

Vector Multiply

We transform a vector by a matrix using a multiply operation.

\[\boldsymbol{v'} = M \cdot \boldsymbol{v}\]
-- Multiplying a 4D vector by a 4x4 matrix
mulMV :: Mat4 -> Vec4 -> Vec4
mulMV (Mat4 m00 m01 m02 m03 m10 m11 m12 m13 m20 m21 m22 m23 m30 m31 m32 m33)
    (Vec4 x y z w) =
        Vec4 (m00 * x + m01 * y + m02 * z + m03 * w)
             (m10 * x + m11 * y + m12 * z + m13 * w)
             (m20 * x + m21 * y + m22 * z + m23 * w)
             (m30 * x + m31 * y + m32 * z + m33 * w)

3D Transformations

In 3D graphics, we apply transformations like translation, scaling, and rotation using matrices. These transformations are applied to 4D vectors, and the operations are represented as matrix multiplications.

Identity Matrix

The identity matrix is a 4x4 matrix that leaves a vector unchanged when multiplied:

\[I = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]
identity :: Mat4
identity = 
    Mat4 1 0 0 0
         0 1 0 0
         0 0 1 0
         0 0 0 1

Translation Matrix

To translate a point by \((t_x, t_y, t_z)\), we use the translation matrix:

\[T = \begin{bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}\]
translation :: Double -> Double -> Double -> Mat4
translation tx ty tz = 
    Mat4 1 0 0 tx
         0 1 0 ty
         0 0 1 tz
         0 0 0 1

Scale Matrix

Scaling a vector by \(s_x, s_y, s_z\) is done using the following matrix:

\[S = \begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]
scale :: Double -> Double -> Double -> Mat4
scale sx sy sz = 
    Mat4 sx 0  0  0
         0  sy 0  0
         0  0  sz 0
         0  0  0  1

Rotation Matrix

In 3D graphics, we frequently need to rotate objects around the X, Y, and Z axes. Each axis has its own corresponding rotation matrix, which we use to apply the rotation transformation to points in 3D space.

A rotation around the X-axis by an angle \(\theta\) is represented by the following matrix:

\[R_x(\theta) = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \theta & -\sin \theta & 0 \\ 0 & \sin \theta & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]

A rotation around the Y-axis by an angle \(\theta\) is represented by the following matrix:

\[R_y(\theta) = \begin{bmatrix} \cos \theta & 0 & \sin \theta & 0 \\ 0 & 1 & 0 & 0 \\ -\sin \theta & 0 & \cos \theta & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]

A rotation around the Z-axis by an angle \(\theta\) is represented by the following matrix:

\[R_z(\theta) = \begin{bmatrix} \cos \theta & -\sin \theta & 0 & 0 \\ \sin \theta & \cos \theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\]

Combining Rotation Matrices

To rotate an object in 3D space about multiple axes, we can multiply the individual rotation matrices. The order of multiplication is crucial since matrix multiplication is not commutative. Typically, we perform rotations in the order of Z, then Y, then X (if required).

\[R = R_x(\theta_x) \cdot R_y(\theta_y) \cdot R_z(\theta_z)\]

Rotation Matrices in Haskell

Let’s implement the rotation matrices for the X, Y, and Z axes in Haskell:

rotationX :: Double -> Mat4
rotationX theta = 
    Mat4 1 0           0            0
         0 (cos theta) (-sin theta) 0
         0 (sin theta) (cos theta)  0
         0 0           0            1

rotationY :: Double -> Mat4
rotationY theta = 
    Mat4 (cos theta)  0 (sin theta) 0
         0            1 0           0
         (-sin theta) 0 (cos theta) 0
         0            0 0           1

rotationZ :: Double -> Mat4
rotationZ theta = 
    Mat4 (cos theta) (-sin theta) 0 0
         (sin theta) (cos theta)  0 0
         0           0            1 0
         0           0            0 1

Example: Rotating an Object

To apply a rotation to an object, you can combine the rotation matrices and multiply them by the object’s position vector. For instance, to rotate a point by \(\theta_x\), \(\theta_y\), and \(\theta_z\), you can multiply the corresponding matrices:

-- Rotate a point by theta_x, theta_y, and theta_z
let rotationMatrix = rotationX thetaX `mulM` rotationY thetaY `mulM` rotationZ thetaZ
let rotatedPoint = mulMV rotationMatrix pointVec

3D Transformations and Projection

Local vs World Coordinates

When dealing with 3D objects, we distinguish between local coordinates (relative to an object) and world coordinates (relative to the entire scene). Vectors are transformed from local to world coordinates by multiplying them by transformation matrices.

Projection Calculation

To project a 3D point onto a 2D screen, we use a projection matrix. The projection matrix transforms 3D coordinates into 2D coordinates by applying a perspective transformation.

A simple perspective projection matrix looks like this:

\[P = \begin{bmatrix} \frac{1}{\text{aspect}} & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{z_f + z_n}{z_n - z_f} & \frac{2z_f z_n}{z_n - z_f} \\ 0 & 0 & -1 & 0 \end{bmatrix}\]

Where:

  • \(z_f\) is the far clipping plane
  • \(z_n\) is the near clipping plane
  • \(\text{aspect}\) is the aspect ratio of the screen
projection :: Double -> Double -> Double -> Double -> Mat4
projection fov aspect near far =
    let scale = 1 / tan (fov / 2)
        in Mat4 (scale / aspect) 0     0                            0
                0                scale 0                            0
                0                0     (-far - near) / (far - near) (-2 * far * near) / (far - near)
                0                0     -1                           0

Reducing a 4D Vector to 2D Screen Coordinates

In 3D graphics, we often work with 4D vectors in homogeneous coordinates. To display a 3D point on a 2D screen, we need to project that point using a projection matrix and then convert the resulting 4D vector into 2D coordinates that we can draw on the screen.

Here’s how this process works:

Step 1: Apply the Projection Matrix

We start with a 4D vector \(\boldsymbol{v}\) in homogeneous coordinates:

\[\boldsymbol{v} = \begin{bmatrix} x \\ y \\ z \\ w \end{bmatrix}\]

We apply the projection matrix \(P\), which transforms the 4D point into clip space (a space where coordinates can be projected to the screen).

The projection matrix looks something like this for perspective projection:

\[P = \begin{bmatrix} \frac{1}{\text{aspect}} & 0 & 0 & 0 \\ 0 & \frac{1}{\tan(\frac{fov}{2})} & 0 & 0 \\ 0 & 0 & \frac{z_f + z_n}{z_n - z_f} & \frac{2z_f z_n}{z_n - z_f} \\ 0 & 0 & -1 & 0 \end{bmatrix}\]

Multiplying \(\boldsymbol{v}\) by \(P\) gives us:

\[\boldsymbol{v'} = P \cdot \boldsymbol{v} = \begin{bmatrix} x' \\ y' \\ z' \\ w' \end{bmatrix}\]

Where:

\[\boldsymbol{v'} = \begin{bmatrix} x' \\ y' \\ z' \\ w' \end{bmatrix} = P \cdot \begin{bmatrix} x \\ y \\ z \\ w \end{bmatrix}\]

Step 2: Perspective Divide

To convert the 4D vector \(\boldsymbol{v'}\) to 3D space, we perform the perspective divide. This means dividing the \(x'\), \(y'\), and \(z'\) components by the \(w'\) component.

The resulting 3D point \(\boldsymbol{v_{3D}}\) is:

\[\boldsymbol{v_{3D}} = \begin{bmatrix} \frac{x'}{w'} \\ \frac{y'}{w'} \\ \frac{z'}{w'} \end{bmatrix}\]

Step 3: Convert to Screen Coordinates

To get the final 2D screen coordinates, we need to convert the 3D point into normalized device coordinates (NDC), which range from -1 to 1. The screen coordinates \((x_{\text{screen}}, y_{\text{screen}})\) are then obtained by scaling these values to the screen dimensions:

\[x_{\text{screen}} = \left( \frac{x_{3D} + 1}{2} \right) \cdot \text{width}\] \[y_{\text{screen}} = \left( \frac{1 - y_{3D}}{2} \right) \cdot \text{height}\]

The factor \(\frac{x_{3D} + 1}{2}\) maps the normalized \(x\)-coordinate from the range [-1, 1] to [0, 1], and multiplying by the screen width gives us the pixel position. The same applies for \(y_{\text{screen}}\), but we invert the \(y_{3D}\) coordinate to account for the fact that screen coordinates typically have the origin at the top-left corner, whereas the NDC system has the origin at the center.

Putting it All Together in Haskell

Here’s how you can perform this transformation in Haskell:

-- Given a projection matrix and a 4D vector, project the vector to screen coordinates
projectToScreen :: Mat4 -> Vec4 -> Double -> Double -> (Double, Double)
projectToScreen projectionMatrix vec width height =
    let Vec4 x' y' z' w' = mulMV projectionMatrix vec  -- Apply projection matrix
        x3D = x' / w'                                  -- Perspective divide
        y3D = y' / w'
        -- Convert from NDC to screen coordinates
        xScreen = (x3D + 1) / 2 * width
        yScreen = (1 - y3D) / 2 * height
    in (xScreen, yScreen)

Example

Suppose we have the following vector and projection matrix:

let vec = Vec4 1 1 1 1  -- 3D point (1, 1, 1)
let projectionMatrix = projection 90 (16/9) 0.1 1000  -- Field of view, aspect ratio, near/far planes
let (xScreen, yScreen) = projectToScreen projectionMatrix vec 1920 1080  -- Screen resolution

This will give you the screen coordinates \(x_{\text{screen}}\) and \(y_{\text{screen}}\), where the 3D point \((1, 1, 1)\) will be projected on a 1920x1080 display.

Conclusion

This has been some of the basic 3D concepts presented through Haskell. In future posts, we’ll use this code to create some basic animations on screen.