Cogs and Levers A blog full of technical stuff

Traits vs Typeclasses - A Deep Comparison

Introduction

If you’ve spent time in both Rust and Haskell, you’ve likely noticed that traits and typeclasses seem eerily similar. In fact, many people describe Rust traits as “typeclasses in disguise.”

But that’s only the beginning.

While traits and typeclasses both offer ad-hoc polymorphism — enabling different types to share behavior — the details around coherence, inference, dispatch, extensibility, and even type-level programming are very different.

In this post, we’ll dig into the core similarities and differences, and walk through side-by-side examples that highlight the strengths (and limitations) of both.

What Are We Talking About?

Let’s start with some basic definitions:

  • A trait in Rust defines a set of methods or behavior that types can implement.
  • A typeclass in Haskell defines a set of functions that a type must implement to be considered part of that class.

At a glance, they look almost identical:

trait Printable {
    fn print(&self);
}
class Printable a where
    print :: a -> IO ()

Implementation: Explicit vs Global

In Rust, you explicitly implement traits per type:

impl Printable for i32 {
    fn print(&self) {
        println!("{}", self);
    }
}

In Haskell, typeclass instances are global:

instance Printable Int where
    print x = putStrLn (show x)

This is one of the first major differences:

  • Rust: Orphan rules prevent impls unless either the trait or type is defined locally.
  • Haskell: Instances are globally coherent — there can only be one per type.

Dispatch: Static vs Dynamic

Rust allows both static and dynamic dispatch:

// Static dispatch (monomorphized at compile time)
fn debug<T: Printable>(x: T) {
    x.print();
}

// Dynamic dispatch via trait objects
fn debug_dyn(x: &dyn Printable) {
    x.print();
}

Haskell only performs static dispatch, inserting a dictionary (a record of function pointers) at compile time:

debug :: Printable a => a -> IO ()
debug x = print x

There is no runtime polymorphism in the sense of trait objects in Haskell.

Type Inference

In Haskell, type inference is rich and automatic:

addOne :: Num a => a -> a
addOne x = x + 1

Haskell will infer the constraint Num a based on the use of +.

In Rust, type annotations are often required — especially in generic code:

fn add_one<T: std::ops::Add<Output = T>>(x: T) -> T {
    x + x
}

Rust tends to prefer explicitness, while Haskell leans into inference.

Higher-Kinded Types

Here’s where the two really diverge.

Haskell supports higher-kinded types, enabling expressive abstractions like Functor, Applicative, and Monad:

class Functor f where
    fmap :: (a -> b) -> f a -> f b

Rust doesn’t currently support higher-kinded types (HKT), though you can simulate some of this with associated types, macros, or GATs (generic associated types).

This limitation makes certain patterns in Rust more awkward — or outright impossible — compared to Haskell.

Overlapping and Flexible Instances

Haskell allows overlapping and multi-parameter instances (with extensions):

class Convert a b where
    convert :: a -> b

Rust has no support for overlapping impls. Every impl must be unambiguous, and Rust’s coherence rules (the “orphan rule”) enforce this at compile time.

Trait Objects vs Typeclass Dictionaries

Here’s a behind-the-scenes peek:

  • Rust: &dyn Trait compiles to a pointer + vtable.
  • Haskell: T :: C a => ... becomes an implicit dictionary passed around — just like a trait object, but known at compile time.

This makes Haskell’s typeclass dispatch fully zero-cost — but not as flexible at runtime.

Example: A Shared Interface

Let’s implement a toy AddOne behavior in both:

Rust:

trait AddOne {
    fn add_one(&self) -> Self;
}

impl AddOne for i32 {
    fn add_one(&self) -> Self {
        self + 1
    }
}

Haskell:

class AddOne a where
    addOne :: a -> a

instance AddOne Int where
    addOne x = x + 1

Nearly identical — but the differences we’ve seen so far affect how you use these abstractions in larger codebases.

So, Which Is Better?

That depends on what you value:

Feature Rust Traits Haskell Typeclasses
Explicit control Yes Partial
Higher-kinded types Not yet Core feature
Inference Sometimes Strong
Localized coherence Yes Global-only
Overlapping instances No With extensions
Runtime polymorphism Via dyn Not supported

Final Thoughts

Rust’s trait system is heavily influenced by Haskell’s typeclasses, but it trades some flexibility for stronger guarantees around coherence, locality, and performance. If you want maximum abstraction power, Haskell wins. If you want performance, predictability, and control — Rust is often a better fit.

Both systems are brilliant in their own way — and understanding both gives you a deeper insight into how powerful type systems can unlock both correctness and expressiveness.

Algebraic Effects in Modern Languages

Introduction

Programming languages have long struggled with how to represent side effects — actions like printing to the console, handling exceptions, or managing state. From exceptions to async/await, from monads to callbacks, the industry has iterated through many paradigms to isolate or compose effectful behavior.

But there’s a new player in town: algebraic effects. Once a theoretical construct discussed in type theory papers, they’re now making their way into real-world languages like Eff, Koka, Multicore OCaml, and even Haskell (via libraries). This post dives into what algebraic effects are, why they matter, and how modern languages are putting them to work.

The Problem With Traditional Control Flow

Most languages bake side effects deep into their semantics. Consider these examples:

  • Exceptions break flow but are hard to compose.
  • Async/await adds sugar but doesn’t unify with other control patterns.
  • Monads (in Haskell and friends) offer composability but can be verbose and hard to stack.

You often end up tightly coupling your program logic with the mechanism that implements side effects. For example, what if you want to switch how logging is done — or intercept all state mutations? In traditional paradigms, that typically requires invasive changes.

Enter Algebraic Effects

Algebraic effects offer a clean abstraction: you declare an operation like Print or Throw, and you handle it separately from where it’s invoked. Think of them as resumable exceptions — but first-class and composable.

There are two parts:

  1. Effect operations – like Log("Hello") or Choose(1, 2)
  2. Effect handlers – define how to interpret or respond to those operations

Here’s a conceptual example:

operation Log : String -> Unit

handler ConsoleLogger {
  handle Log(msg) => print(msg)
}

handle {
  Log("Hello")
  Log("World")
} with ConsoleLogger

The code requests the effect, and the handler interprets it.

This separation makes effects modular and swappable.

Under the Hood: Continuations and Handlers

To implement algebraic effects, a language usually relies on delimited continuations — the ability to capture “the rest of the computation” when an effect is invoked, and then resume it later.

Think of it like pausing the program, giving control to a handler, and optionally continuing from where you left off.

Let’s break it down.

What Happens at Runtime?

Suppose we run this (in a made-up language):

effect Log : String -> Unit

handle {
  Log("step 1")
  Log("step 2")
  Log("done")
} with ConsoleLogger

The runtime treats Log("step 1") as a request rather than a built-in action.

When it hits that line:

  1. It pauses execution at the Log point.
  2. It captures the continuation — i.e., everything that comes after the Log("step 1").
  3. It gives control to the ConsoleLogger handler.
  4. The handler decides what to do:
    • Call print("step 1")
    • Resume the captured continuation to proceed to Log("step 2")

This “pause-and-resume” behavior is the key.

Visualizing With a Continuation

Let’s walk through this with a simplified stack trace:

Before the first Log("step 1"):

handle {
  [Log("step 1"); Log("step 2"); Log("done")]
} with ConsoleLogger

When Log("step 1") is reached, the continuation is:

continuation = {
  Log("step 2")
  Log("done")
}

The handler receives the message "step 1" and the continuation. It can:

  • Resume it once (like normal flow)
  • Discard it (like throwing an exception)
  • Resume it multiple times (like a forked computation)

How This Explains Exceptions

Exceptions are a special case of algebraic effects — with no continuation.

Throwing says . . Stop here. Find a handler up the call stack. Don’t resume.

Let’s define a custom effect Throw(msg):

effect Throw : String -> Never

handle {
  if error {
    Throw("bad input")
  }
  print("This will never run")
} with ExceptionHandler

In this case, the handler intercepts Throw, but never resumes the continuation. The program takes a different branch.

💡 Remember Effect handlers don't have to resume — they define the rules

How This Explains I/O

Now suppose we want to model an I/O operation:

effect ReadLine : Unit -> String

handle {
  let name = ReadLine()
  Log("Hi " + name)
} with {
  handle ReadLine() => "Alice"
  handle Log(msg) => print(msg)
}

Here, ReadLine is not tied to any global input stream. It’s an abstract operation that the handler chooses how to interpret — maybe it prompts the user, maybe it returns a mock value.

🧪 Perfect for Testing Handlers let you swap out real I/O with fake data. You don’t need to patch or stub anything — just handle the effect differently.

The continuation gets resumed with the string "Alice", and proceeds to log "Hi Alice".

How This Explains Async/Await

Let’s look at an async-style effect: Sleep(ms). We could simulate async behavior with handlers and continuations:

effect Sleep : Int -> Unit

handle {
  Log("Start")
  Sleep(1000)
  Log("End")
} with AsyncHandler

When the program hits Sleep(1000), it:

  1. Captures the continuation (Log("End"))
  2. Asks the handler to delay for 1000 ms
  3. When the delay completes, the handler resumes the continuation

So in an async-capable runtime, Sleep could enqueue the continuation in a task queue — very similar to await.

Effect Flow

Let’s visualize the execution:

graph TD A[Program Starts] --> B[Perform Log] B --> C[Handler Receives Effect and Continuation] C --> D[Handler Prints Hello] D --> E[Handler Resumes Continuation] E --> F[Next Effect or End]

Each effect call yields control to its handler, which decides what to do and when to resume.

Summary

Algebraic effects give you a way to pause execution at key points and delegate the decision to an external handler. This lets you:

  • Model exceptions (Throw with no resume)
  • Emulate async/await (Sleep with delayed resume)
  • Intercept I/O or tracing (Log, ReadLine, etc.)
  • Compose multiple effects together (logging + state + error handling)

The idea is powerful because you capture just enough of the stack to resume — not the whole program, not the whole thread — just a clean slice.

This is the beating heart of algebraic effects: capturable, resumable, programmable control flow.

Examples Across Languages

Let’s look at how modern languages express algebraic effects.

Eff (by Andrej Bauer)

Eff is a small experimental language built around effects.

effect Choose : (int * int) -> int

let choose_handler = handler {
  val x -> x
  | Choose(x, y) k -> k(x) + k(y)
}

with choose_handler handle {
  let result = Choose(1, 2)
  result * 10
}

This handler resumes the continuation twice — once with 1 and once with 2 — and adds the results. Very cool.

Koka

Koka (by Daan Leijen at Microsoft) is a strongly typed language where every function explicitly declares its effects.

function divide(x: int, y: int) : exn int {
  if (y == 0) throw("divide by zero")
  else x / y
}

Koka tracks effects statically in the type system — you can see exn in the return type above.

OCaml with Multicore Support

Multicore OCaml added support for effects using new syntax:

effect ReadLine : string

let read_input () = perform ReadLine

let handler = 
  match read_input () with
  | effect ReadLine k -> continue k "mocked input"

You can install handlers and intercept effects using pattern matching.

Haskell (with polysemy or freer-simple)

Algebraic effects in Haskell are expressed via libraries.

data Log m a where
  LogMsg :: String -> Log m ()

runLogToIO :: Member IO r => Sem (Log ': r) a -> Sem r a
runLogToIO = interpret (\case
  LogMsg s -> sendM (putStrLn s))

These libraries emulate effects using GADTs and free monads under the hood, offering a composable way to layer side effects.

Why Use Algebraic Effects?

  • Separation of concerns – pure logic stays free from effect details
  • Composable – you can layer state, logging, exceptions, etc.
  • Testable – effects can be mocked or redirected
  • Flexible control flow – resumable exceptions, nondeterminism, backtracking

They’re especially attractive for interpreters, DSLs, async runtimes, and functional backends.

The Downsides

Of course, there are tradeoffs:

  • Runtime overhead – stack capturing can be expensive
  • Complexity – debugging and stack traces are harder
  • Still experimental – limited tooling, especially in statically typed systems
  • Compiler support – not many mainstream languages have full support

But the ideas are gaining traction, and you can expect to see more of them in new languages (and maybe in existing ones like JavaScript or Swift).

The Future of Effects

Algebraic effects could fundamentally change how we write software:

  • Async/await might become just an effect
  • Logging, tracing, and observability could become pluggable
  • Pure functions could request effects without being impure

This vision aligns with a long-standing dream in language design: orthogonal, composable effects that don’t compromise reasoning.

Wrapping Up

Algebraic effects are still a frontier — but a promising one. They offer a middle ground between pure functions and side-effect-laden imperative code. By letting you request an effect and handle it elsewhere, they make programs easier to test, modify, and reason about.

Whether you’re writing interpreters, backend services, or just experimenting with new paradigms, algebraic effects are well worth exploring. The future of control flow may be algebraic — and the best part is, it’s just getting started.

Getting NGINX to Do Things

Introduction

NGINX is famous for serving static content and handling reverse proxies — but it can do a lot more. In this post, we’re going to explore three “power moves” with NGINX:

  • Running a reverse proxy to local services
  • Embedding Lua scripts using OpenResty
  • Talking to local services over Unix domain sockets

By the end, you’ll be able to glue together web services like a pro — no Node.js middleman required.

Setup

Before we begin, we’ll setup your local development environment so that you can experiment with a few of these configurations. We’ll use docker and the OpenResty image to simplify our setup.

I’ve created a directory structure that looks like this:

.
├── conf.d/
├── lua/
└── nginx.conf

The nginx.conf file has some boilerplate in it:

worker_processes 1;

events {
    worker_connections 1024;
}

http {
    lua_package_path "/usr/local/openresty/nginx/lua/?.lua;;";

    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    keepalive_timeout  65;

    include /etc/nginx/conf.d/*.conf;
}

We include the lua_package_path directive to tell OpenResty where to find any lua modules that we’ll add in.

We’ll build this directory structure out as we go. We’ll start our proxy server at the root of that directory structure with the following command:

docker run --rm -it \              
    -p 8080:8080 \
    -v "$PWD/nginx.conf:/usr/local/openresty/nginx/conf/nginx.conf:ro" \
    -v "$PWD/conf.d/basic.conf:/etc/nginx/conf.d/default.conf:ro" \
    openresty/openresty:alpine

As we need, we’ll mount in more configurations and modules.

Now that’ve got a testbed to work with, we can checkout a few of these different configurations.

Reverse Proxy Basics

Let’s say you’ve got a service running locally on port 5000, and you want to expose it at https://example.com/api/.

Here’s a minimal NGINX config:

server {
    listen 8080;

    location /api/ {
        proxy_pass http://localhost:5000/;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

A couple of points to note here:

  • That trailing slash on proxy_pass matters.
  • Use proxy_set_header to preserve client info.

This setup allows you to do things like offload your SSL/TLS onto NGINX rather than needing to deal with it inside of your application. You’re actually controlling the flow of traffic using this reverse proxy setup, so it will make your overall system design a lot more flexible should you need to pivot in future.

I put this configuration into a file called ./conf.d/basic.conf. I run this with the following:

docker run --rm -it \              
    -p 8080:8080 \
    -v "$PWD/nginx.conf:/usr/local/openresty/nginx/conf/nginx.conf:ro" \
    -v "$PWD/conf.d/basic.conf:/etc/nginx/conf.d/default.conf:ro" \
    openresty/openresty:alpine

Sending a curl request should fail (if you’re like me and you don’t have a service running on port 5000):

curl http://localhost:8080/api/test

<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>openresty/1.27.1.2</center>
</body>
</html>

This is all well and good for basic reverse proxying. What if you need a little more functionality at your proxy? You may want some arbitrary logic. In order to do this, you need some more help.

Lua + OpenResty: Custom Logic at the Edge

Want conditional logic? Inline transformations? Run Lua scripts inside NGINX using OpenResty.

Here’s an example that rewrites responses:

location /hello {
    content_by_lua_block {
        ngx.say("Hello from Lua!")
    }
}

Save this into ./conf.d/lua.conf and then ee can get this running with the following:

docker run --rm -it \              
  -p 8080:8080 \
  -v "$PWD/nginx.conf:/usr/local/openresty/nginx/conf/nginx.conf:ro" \
  -v "$PWD/conf.d/lua.conf:/etc/nginx/conf.d/default.conf:ro" \
  openresty/openresty:alpine

A simple request to the /hello endpoint:

curl http://localhost:8080/hello
Hello from Lua!

This demonstrates the usage of content_by_lua_block to provide content to the response.

Or maybe you want to inspect a header before proxying:

location /auth {
    access_by_lua_block {
        local token = ngx.var.http_authorization
        if token ~= "Bearer secrettoken" then
            ngx.status = 401
            ngx.say("Unauthorized")
            return ngx.exit(401)
        end
    }

    proxy_pass http://localhost:7000/;
}

We can now test this out with the following:

curl -H "Authorization: Bearer secrettoken" http://localhost:8080/auth

Which sends our super secret token through.

Lua runs inside NGINX’s event loop, it’s lightweight, fast, and perfect for request-time filtering, routing, or even metrics.

Proxying to a Unix Domain Socket

Sometimes your app listens on a Unix domain socket, not a TCP port. NGINX can talk to those too:

First, we’ll start a simple server running locally with socat.

socat -v UNIX-LISTEN:/tmp/mysock,reuseaddr,fork SYSTEM:'while read line; do echo "You said: $line"; done'

The socket is created by the host user, but inside the container, NGINX runs as a different user (typically nobody). To avoid a “permission denied” error, I made the socket world-accessible:

chmod 777 /tmp/mysock

Then configure NGINX:

server {
    listen 8080;

    location /socket/ {
	proxy_pass http://unix:/tmp/mysock:;
	proxy_set_header Host $host;
	proxy_set_header X-Real-IP $remote_addr;
    }
}

Now we can run this with the following:

docker run --rm -it \
    -p 8080:8080 \
    -v "$PWD/nginx.conf:/usr/local/openresty/nginx/conf/nginx.conf:ro" \
    -v "$PWD/conf.d/unix.conf:/etc/nginx/conf.d/default.conf:ro" \
    -v "/tmp/mysock:/tmp/mysock" \
    openresty/openresty:alpine

You can see that we’ve needed to mount in our socket.

Wrap-up

NGINX isn’t just a dumb proxy — with OpenResty and some careful configuration, it becomes a programmable router and request gatekeeper. If you’re building internal tools, APIs, or secure microservices — this setup is gold.

Exploring Continuations and CPS Transformation

Introduction

Continuations are one of those ideas that seem abstract at first — but once they click, you’ll start seeing them everywhere: in asynchronous code, in exception handling, in generators, even in the way you reason about program flow.

In this post, we’ll explore what continuations are, how Continuation-Passing Style (CPS) transforms your code, and how these concepts are quietly powering modern async constructs like async/await.

We’ll walk through synchronous code, asynchronous patterns with callbacks and promises, and finally reach a new understanding of what’s really going on under the hood. You won’t need to reimplement a language or build a compiler to follow along — we’ll do everything with regular JavaScript, Python, and Rust.

What is a Continuation?

A continuation is a representation of “what to do next” in a program. At any point in your code, the rest of the computation can be thought of as a function — the continuation.

Let’s start simple:

function addOne(x) {
  return x + 1;
}

console.log(addOne(2)); // prints 3

Now, instead of returning, what if we passed the result to another function — the continuation?

function addOneCPS(x, cont) {
  cont(x + 1);
}

addOneCPS(2, (result) => {
  console.log(result); // prints 3
});

This style is called Continuation-Passing Style (CPS). In CPS, functions never return — they call their continuation instead.

This isn’t immediately remarkable. Callbacks have been used widely in code for quite some time now. This is just building a picture of where we’ve come from.

From Sync Code to CPS

Let’s expand our example to chain two operations:

function double(x) {
  return x * 2;
}

function addOne(x) {
  return x + 1;
}

console.log(addOne(double(5))); // 11

Now in CPS:

function doubleCPS(x, cont) {
  cont(x * 2);
}

function addOneCPS(x, cont) {
  cont(x + 1);
}

doubleCPS(5, (doubled) => {
  addOneCPS(doubled, (result) => {
    console.log(result); // 11
  });
});

We’ve restructured our program so that each step explicitly passes control to the next.

This may seem verbose, but it turns out to be extremely powerful — especially when dealing with asynchronous code.

CPS is Everywhere in JavaScript

Let’s look at an example from the Node.js world:

const fs = require("fs");

fs.readFile("data.txt", "utf8", (err, data) => {
  if (err) return console.error("Failed to read file:", err);
  processData(data, (result) => {
    console.log("Result:", result);
  });
});

This is literally CPS — instead of returning data, fs.readFile passes it to a callback.

What’s the callback? The continuation.

Promises: CPS with a Better API

JavaScript promises are built around the same idea, just cleaner:

fetch("/api/data")
  .then((res) => res.json())
  .then((data) => {
    console.log("Fetched:", data);
  })
  .catch((err) => {
    console.error("Error:", err);
  });

Every .then(fn) is passing the result to a new continuation function. But promises let us flatten the nesting and chain more cleanly.

async/await: The Illusion of Sync

Now the same thing with async/await:

async function loadData() {
  try {
    const res = await fetch("/api/data");
    const data = await res.json();
    console.log("Fetched:", data);
  } catch (err) {
    console.error("Error:", err);
  }
}

But this isn’t “real” synchronous code — it just looks like it.

Behind the scenes, the JavaScript engine is:

  • Splitting the function into pieces at each await
  • Saving the continuation (the rest of the function) in a hidden state machine
  • Resuming that continuation when the awaited promise resolves

That’s CPS at work.

Manual CPS in Other Languages

You don’t need a JavaScript engine to try this out. Let’s look at Rust and Python examples to see how CPS can be expressed in ordinary code.

Rust Example

fn double_cps(x: i32, cont: impl FnOnce(i32)) {
    cont(x * 2);
}

fn add_one_cps(x: i32, cont: impl FnOnce(i32)) {
    cont(x + 1);
}

fn main() {
    double_cps(5, |doubled| {
        add_one_cps(doubled, |result| {
            println!("Result: {}", result); // prints 11
        });
    });
}

Rust’s closures let us express continuations cleanly without needing async runtimes or macros.

Python Example

The same example can be implemented using python pretty simply.

def double_cps(x, cont):
    cont(x * 2)

def add_one_cps(x, cont):
    cont(x + 1)

double_cps(5, lambda doubled:
    add_one_cps(doubled, lambda result:
        print("Result:", result)
    )
)

Python’s lambdas work just like JavaScript’s arrow functions here. Every step is chained by explicitly passing the next operation as a continuation.

async/await in Python: CPS in the Runtime

Just like JavaScript, Python’s async def and await are built on top of generators and continuations:

import asyncio

async def get_data():
    await asyncio.sleep(1)
    return 42

async def main():
    value = await get_data()
    print("Got:", value)

asyncio.run(main())

Here, too, the interpreter:

  • Splits your function at each await
  • Stores the rest of the computation (continuation) to run later

Why Learn CPS?

Once you can see continuations, you start to understand:

  • How async runtimes work
  • How exception handling works (dual continuations!)
  • How interpreters implement tail calls and coroutines
  • How to reason about control flow in state machines

Bonus: Control Flow as a Tree

You can visualize your program’s control flow like a tree. Each continuation is a branch.

graph TD A[Start] A --> B[doubleCPS] B --> C[addOneCPS] C --> D[print]

When you await, you’re pausing on one branch — and resuming it later.

Conclusion

Continuations and CPS transform how we think about execution.

They explain:

  • Why callbacks exist
  • How async/await works under the hood
  • How control flow can be captured, resumed, or redirected

You don’t need to write a compiler to use CPS — just pass the “rest of your program” as a function.

By making the invisible visible, continuations give us precise control over what our code does next.

A Practical Guide to Compiler Phases

Introduction

Compilers often sound mysterious — full of dragons, jargon, and intimidating diagrams. But under the hood, they all follow the same basic blueprint.

In this post, we’ll walk through every major compiler phase by implementing them in Rust. We’ll start with a raw source string and end with real output — step by step — using this ultra-simple language:

let x = 1 + 2;

We’ll transform that line through tokenization, parsing, type checking, intermediate representation, optimization, and finally evaluation — all in plain Rust. Every phase will be practical, with real code you can run. By the end, you’ll have a minimal but complete compiler front-end — a solid foundation you can build on.

Here’s a high-level view of the journey:

graph TD A["Source Code"] --> B["Lexical Analysis
(Tokenizer)"] B --> C["Parsing
(AST Builder)"] C --> D["Semantic Analysis
(Type Checking)"] D --> E["IR Generation
(Lowering to Intermediate Form)"] E --> F["Optimization
(Constant Folding, etc.)"] F --> G["Code Generation
(Assembly or Interpreter)"] G --> H["Final Output
(Binary or Result)"]

Let’s dive in!

Lexical Analysis

Lexical analysis — also called tokenization or scanning — is the first phase of a compiler. It takes a stream of raw characters (your source code) and breaks it into tokens: the smallest meaningful units of the language.

Tokens include things like keywords (let), identifiers (x), operators (+), and literals (1, 2). Once tokenized, the compiler can start to reason about structure instead of individual characters.

In our example:

let x = 1 + 2;

The tokenizer will produce something like:

[Let, Identifier("x"), Equals, Integer(1), Plus, Integer(2), Semicolon]

So, we need to define our tokens:

#[derive(Debug, PartialEq)]
pub enum Token {
    Let,
    Identifier(String),
    Equals,
    Integer(i64),
    Plus,
    Semicolon,
}

Now we need to actually turn a string into a set of tokens.

pub fn tokenize(input: &str) -> Vec<Token> {
    let mut chars = input.chars().peekable();
    let mut tokens = Vec::new();

    while let Some(&ch) = chars.peek() {
        match ch {
            ' ' | '\n' | '\t' => {
                chars.next();
            }

            '=' => {
                chars.next();
                tokens.push(Token::Equals);
            }

            '+' => {
                chars.next();
                tokens.push(Token::Plus);
            }

            ';' => {
                chars.next();
                tokens.push(Token::Semicolon);
            }

            '0'..='9' => {
                let mut num = String::new();
                while let Some('0'..='9') = chars.peek() {
                    num.push(chars.next().unwrap());
                }
                let value = num.parse().unwrap();
                tokens.push(Token::Integer(value));
            }

            'a'..='z' | 'A'..='Z' | '_' => {
                let mut ident = String::new();
                while let Some(ch) = chars.peek() {
                    if ch.is_alphanumeric() || *ch == '_' {
                        ident.push(chars.next().unwrap());
                    } else {
                        break;
                    }
                }

                match ident.as_str() {
                    "let" => tokens.push(Token::Let),
                    _ => tokens.push(Token::Identifier(ident)),
                }
            }

            _ => panic!("Unexpected character: {}", ch),
        }
    }

    tokens
}

You can see here that there’s no real validation going on here. We’re just turning significant characters into tokens. The only piece of validation that does occur is if a character isn’t supported by the process, we’ll panic.

Parsing

Now that we have a stream of tokens, it’s time to make sense of them structurally. That’s the job of parsing — converting a flat list of tokens into a tree-like structure that represents the hierarchy and meaning of the code.

This structure is called an Abstract Syntax Tree (AST).

An AST is a simplified, structured representation of your source code that captures its grammatical structure — without worrying about superficial syntax details like commas, parentheses, or whitespace. It lets the compiler understand what the code means rather than just what it looks like.

Taking our simple example from above, we want to produce a tree that looks like this:

graph TD A["LetBinding"] A1["name: 'x'"] A2["value: BinaryOp"] A --> A1 A --> A2 B1["left: Literal(1)"] B2["op: '+'"] B3["right: Literal(2)"] A2 --> B1 A2 --> B2 A2 --> B3

The parser turns our flat tokens into this rich structure — making it possible for later phases (like type checking and code generation) to analyze, transform, and ultimately execute the code.

The expressions that our parser will support can be a literal value, or it can be a binary operation. You’ll notice that the binary operation is also self-referencing Expr.

#[derive(Debug)]
pub enum Expr {
    Literal(i64),
    BinaryOp {
        left: Box<Expr>,
        op: char,
        right: Box<Expr>,
    },
}

#[derive(Debug)]
pub struct LetBinding {
    pub name: String,
    pub value: Expr,
}

The let binding is a type to bind our expression to a symbolic name.

Now we can take a list of tokens, and parse that out into a LetBinding:

pub fn parse(tokens: &[Token]) -> LetBinding {
    let mut pos = 0;

    fn expect_token(tokens: &[Token], pos: &mut usize, expected: &Token) {
        if &tokens[*pos] != expected {
            panic!("Expected {:?}, got {:?}", expected, tokens[*pos]);
        }
        *pos += 1;
    }

    expect_token(tokens, &mut pos, &Token::Let);

    let name = match &tokens[pos] {
        Token::Identifier(s) => {
            pos += 1;
            s.clone()
        }
        other => panic!("Expected identifier, got {:?}", other),
    };

    expect_token(tokens, &mut pos, &Token::Equals);

    let left = match &tokens[pos] {
        Token::Integer(n) => {
            pos += 1;
            Expr::Literal(*n)
        }
        _ => panic!("Expected integer"),
    };

    let op = match &tokens[pos] {
        Token::Plus => {
            pos += 1;
            '+'
        }
        _ => panic!("Expected '+'"),
    };

    let right = match &tokens[pos] {
        Token::Integer(n) => {
            pos += 1;
            Expr::Literal(*n)
        }
        _ => panic!("Expected integer"),
    };

    expect_token(tokens, &mut pos, &Token::Semicolon);

    LetBinding {
        name,
        value: Expr::BinaryOp {
            left: Box::new(left),
            op,
            right: Box::new(right),
        },
    }
}

Semantic Analysis

Once we have an Abstract Syntax Tree (AST), we know how the code is structured. But we still don’t know if the code makes sense.

That’s where semantic analysis comes in. This phase checks the meaning of the code — validating things like:

  • Are all variables declared before they’re used?
  • Are types used correctly?
  • Can the operations actually be performed?

Even though let x = 1 + 2; is syntactically valid, we still need to ensure the types on both sides of + are compatible, and that x is a valid target for assignment.

Semantic analysis walks the AST and performs:

  • Type checking (e.g., can you add these two things?)
  • Scope resolution (e.g., is this variable declared?)
  • Error reporting for violations (e.g., type mismatches)

We need to tell our compiler about types.

#[derive(Debug, PartialEq)]
enum Type {
    Int,
}

Now we can lean on the recursive nature of our Expr struct to process it recursively. Very simply, we only support one type: Int.

fn type_check(expr: &Expr) -> Result<Type, String> {
    match expr {
        Expr::Literal(_) => Ok(Type::Int),
        Expr::BinaryOp { left, right, .. } => {
            let lt = type_check(left)?;
            let rt = type_check(right)?;
            if lt == Type::Int && rt == Type::Int {
                Ok(Type::Int)
            } else {
                Err("Type mismatch".into())
            }
        }
    }
}

Intermediate Representation (IR)

At this point, we have code that’s both structurally valid (parsed into an AST) and semantically correct (passes type checking). Now it’s time to lower that code into something simpler and easier to manipulate: an Intermediate Representation, or IR.

Think of IR as the “compiler’s private language” — a stripped-down version of your program that’s:

  • Easier to optimize
  • Easier to analyze
  • Easier to transform into real machine instructions

In real-world compilers, IR might take the form of:

  • LLVM IR: used by Rust itself
  • MIR: Rust’s internal mid-level representation
  • Bytecode: as used by interpreters like Java

In our toy compiler, we’ll build a stack-based IR — a sequence of simple instructions like:

LoadConst 1
LoadConst 2
Add
Store x

These IR instructions are:

  • Flat (no nesting like ASTs)
  • Uniform (each step is explicit)
  • Machine-friendly (easy to interpret or compile)

This gives us a stable, minimal foundation for:

  • Optimizations like constant folding
  • Code generation for interpreters or assembly
  • Debugging and instrumentation

We’ll now:

  • Define an IR instruction set in Rust
  • Walk the AST to emit IR
  • Prepare for later phases like optimization and evaluation

This is the compiler’s bridge between high-level intent and low-level action.

We need a way to define our IR:

#[derive(Debug)]
pub enum IR {
    LoadConst(i64),
    Add,
    Store(String),
}

Now we can generate IR from our LetBinding:

pub fn generate_ir(binding: &LetBinding) -> Vec<IR> {
    let mut ir = Vec::new();

    match &binding.value {
        Expr::BinaryOp { left, right, op } => {
            if let Expr::Literal(l) = **left {
                ir.push(IR::LoadConst(l));
            }
            if let Expr::Literal(r) = **right {
                ir.push(IR::LoadConst(r));
            }

            if *op == '+' {
                ir.push(IR::Add);
            }
        }
        _ => panic!("Unsupported expression"),
    }

    ir.push(IR::Store(binding.name.clone()));
    ir
}

Optimization

Once we have our Intermediate Representation (IR), we can start to improve it.

That’s what the optimization phase is all about — rewriting the IR to make it:

  • Faster to run
  • Simpler to execute
  • More efficient in terms of operations

Crucially, optimizations don’t change the meaning of the program. They just make it better.

In our toy compiler, we’ll implement a classic example: constant folding. This is when the compiler evaluates constant expressions ahead of time.

Instead of this:

LoadConst 1
LoadConst 2
Add
Store x

We generate this:

LoadConst 3
Store x

That means less work at runtime — and it’s a stepping stone to more advanced techniques like dead code elimination, common subexpression elimination, or register allocation.

Even in small compilers, optimization is important because:

  • It reduces unnecessary instructions
  • It prepares the IR for efficient code generation
  • It gives you experience with real-world compiler passes

In this section, we’ll:

  • Walk through our IR instructions
  • Detect simple constant patterns
  • Replace them with pre-computed values

The logic will be basic — but the mechanism will mirror what real compilers do at massive scale.

pub fn optimize(ir: &[IR]) -> Vec<IR> {
    if let [IR::LoadConst(a), IR::LoadConst(b), IR::Add, IR::Store(name)] = &ir[..] {
        vec![
            IR::LoadConst(a + b),
            IR::Store(name.clone()),
        ]
    } else {
        ir.to_vec()
    }
}

Code Generation / Evaluation

Rather than generating x86 assembly, let’s just interpret the IR.

use std::collections::HashMap;

pub fn run(ir: &[IR]) {
    let mut stack = Vec::new();
    let mut vars = HashMap::new();

    for instr in ir {
        match instr {
            IR::LoadConst(n) => stack.push(*n),
            IR::Add => {
                let b = stack.pop().unwrap();
                let a = stack.pop().unwrap();
                stack.push(a + b);
            }
            IR::Store(name) => {
                let val = stack.pop().unwrap();
                vars.insert(name.clone(), val);
                println!("{} = {}", name, val);
            }
        }
    }
}

Putting it all together

Now we can use each of these functions in a pretty neat sequence:

fn main() {
    let source = "let x = 1 + 2;";
    let tokens = tokenize(source);
    let parsed = parse(&tokens);
    let _ = type_check(&parsed.value).expect("Type error");
    let ir = generate_ir(&parsed);
    let opt_ir = optimize(&ir);
    run(&opt_ir);
}

Output:

x = 3

Summary

Well, that’s the basics of the internals of a compiler!

We’ve built the start of something here that could be built upon with new pieces at every step.