Cogs and Levers A blog full of technical stuff

Understanding Pin-safe Types in Rust

Introduction

Rust is famous for giving you memory safety without a garbage collector. But when you start doing lower-level work — self-referential structs, async state machines, or FFI — you run into a powerful but mysterious feature: Pin.

In this article, we’ll answer the following:

  • What does it mean for a type to be pin-safe?
  • Why would a type need to be pinned in the first place?
  • How do you build one safely — without fighting the borrow checker?

We’ll walk through simple examples first, then build up to a self-referential type.

What is a Pin-safe Type?

A pin-safe type is a type that can be safely used with Rust’s Pin API.

Pin: It promises not to move itself in memory after being pinned, and it uses unsafe code responsibly to uphold that guarantee.

You create a Pin-safe type when:

  • You need to guarantee that a value won’t move in memory after being created.
  • You want to allow self-referencing inside a struct (e.g., a field pointing to another field).
  • You’re building async state machines, generators, or intrusive data structures.

Self-Referential Structs: The Core Problem

Let’s look at a classic case. Say you have a struct like this:

struct Example {
    a: String,
    b: *const String, // b points to a
}

This is a self-referential struct: b stores a pointer to another field inside the same struct.

Seems harmless?

Here’s the catch: Rust moves values around freely — into function calls, collections, etc. If you set up a pointer inside a struct and then move the struct, your pointer is now invalid. This opens the door to use-after-free bugs.

Rust’s borrow checker normally prevents you from doing this. But sometimes you do need this — and that’s where Pin comes in.

Pin to the Rescue

Pin<T> says Once this value is pinned, it cannot be moved again.

This is perfect for self-referential types — it guarantees their memory address won’t change.

But you have to build your type carefully to uphold this contract.

A Pin-safe Self-Referential Type

Now let’s build a Pin-safe type step-by-step.

Step 1: Define the structure

use std::pin::Pin;
use std::marker::PhantomPinned;

struct SelfRef {
    data: String,
    data_ref: *const String, // raw pointer, not a safe Rust reference
    _pin: PhantomPinned,     // opt-out of Unpin
}
  • data: holds some content.
  • data_ref: stores a pointer to that data.
  • PhantomPinned: tells Rust this type is not safe to move after being pinned.

Step 2: Pin and initialize

impl SelfRef {
    fn new(data: String) -> Pin<Box<SelfRef>> {
        let mut s = Box::pin(SelfRef {
            data,
            data_ref: std::ptr::null(),
            _pin: PhantomPinned,
        });

        let data_ref = &s.data as *const String;

        unsafe {
            let mut_ref = Pin::as_mut(&mut s);
            Pin::get_unchecked_mut(mut_ref).data_ref = data_ref;
        }

        s
    }

    fn get(&self) -> &String {
        unsafe { &*self.data_ref }
    }
}

Step 3: Use it

fn main() {
    let s = SelfRef::new("Hello, world!".into());
    println!("Data ref points to: {}", s.get());
}

Key Points

  • You must pin the struct before setting the self-reference.
  • Box::pin allocates it on the heap and returns a pinned pointer.
  • PhantomPinned disables auto-Unpin so it can’t be accidentally moved.
  • unsafe is required to set the internal pointer — you must guarantee it’s only done after pinning.

Summary Table

Concept Example Needs Pin? Why?
Normal struct Logger { name: String } No self-references
Self-referential SelfRef { data, data_ref: &data } Unsafe if moved
Async generators async fn or Future Compiler may generate self-refs
FFI callbacks extern "C" with inner pointers Must stay in place for C code

Conclusion

Most types in Rust are move-safe and don’t need Pin. But when you’re working with:

  • self-referential structs,
  • low-level async primitives,
  • foreign function interfaces (FFI),

…you may need to reach for Pin.

A Pin-safe type is your promise to the compiler that “this won’t move again — and I’ve made sure everything inside is OK with that.”

Building a Stack-Based VM in Rust - Part 5

Introduction

In Part 4, we introduced named words and a dictionary that allowed our VM to call subroutines by name. But there was still one major gap:

We couldn’t write Forth-like code.

You still had to manually build vec![Instruction::Push(5), ...] in Rust. That changes now.

In this post, we’ll add a hand-rolled parser that understands simple Forth syntax — including word definitions like : square dup * ; — and emits instructions automatically.

By the end of this part, you’ll be able to write:

5 square : square dup * ;

And run it on your virtual machine with no hardcoded addresses or manual instruction building.

The Goal

Our parser will:

  • Tokenize simple Forth input
  • Track whether we’re inside a : definition
  • Split instructions into main and definitions
  • Insert a Halt after top-level code to prevent fall-through
  • Track the correct addresses for word definitions
  • Build a final list of instructions ready to run

Here’s the full updated code.

Parser Support

First of all, we define our Parser struct — this separates the parsing logic from the VM runtime.

struct Parser {
    main: Vec<Instruction>,
    definitions: Vec<Instruction>,
    dictionary: HashMap<String, usize>,
}

Here’s what each member does:

  • main: Top-level code that runs first. This is where calls like 5 square are emitted.
  • definitions: Instructions belonging to : word ... ; definitions.
  • dictionary: A mapping of word names (like "square") to their starting address in the final instruction stream.

We initialize the parser with empty sections:

impl Parser {
    fn new() -> Self {
        Self {
            main: Vec::new(),
            definitions: Vec::new(),
            dictionary: HashMap::new(),
        }
    }
}

Token Parsing

The heart of the parser is the parse method. We split the input on whitespace and interpret each token in turn.

fn parse(&mut self, input: &str) {
    let mut tokens = input.split_whitespace().peekable();
    let mut defining: Option<String> = None;
    let mut buffer: Vec<Instruction> = Vec::new();

    while let Some(token) = tokens.next() {
        match token {
            ":" => {
                // Beginning of a word definition
                let name = tokens.next().expect("Expected word name after ':'");
                defining = Some(name.to_string());
                buffer.clear();
            }
            ";" => {
                // End of word definition
                if let Some(name) = defining.take() {
                    buffer.push(Instruction::Return);
                    let addr = self.main.len() + self.definitions.len() + 1; // +1 for HALT
                    self.dictionary.insert(name, addr);
                    self.definitions.extend(buffer.drain(..));
                } else {
                    panic!("Unexpected ';' outside of word definition");
                }
            }
            word => {
                // Otherwise, parse an instruction
                let instr = if let Ok(n) = word.parse::<i32>() {
                    Instruction::Push(n)
                } else {
                    match word {
                        "dup" => Instruction::Dup,
                        "drop" => Instruction::Drop,
                        "swap" => Instruction::Swap,
                        "over" => Instruction::Over,
                        "+" => Instruction::Add,
                        "*" => Instruction::Mul,
                        "depth" => Instruction::Depth,
                        _ => Instruction::CallWord(word.to_string()),
                    }
                };

                // Add to appropriate section
                if defining.is_some() {
                    buffer.push(instr);
                } else {
                    self.main.push(instr);
                }
            }
        }
    }
}

Breakdown of the cases:

  • : begins a new named word definition.
  • ; ends the definition, emits a Return, and stores the word’s starting address in the dictionary.
  • A number becomes a Push(n) instruction.
  • Built-in words like + and * become direct Instruction variants.
  • Any unknown token is assumed to be a user-defined word, and gets translated to CallWord("name").

Finalizing the Program

Once parsing is complete, we combine the main program with definitions — separated by a Halt to ensure we don’t fall through.

fn finalize(self) -> (Vec<Instruction>, HashMap<String, usize>) {
    let mut instructions = self.main;
    instructions.push(Instruction::Halt); // Halts after main program
    instructions.extend(self.definitions);
    (instructions, self.dictionary)
}

Main Program

Our main() function now uses the parser to construct the program from a Forth-style string.

fn main() {
    let mut parser = Parser::new();
    parser.parse("5 square : square dup * ;");

    let (instructions, dictionary) = parser.finalize();

    for instr in &instructions {
        println!("{:?}", instr);
    }

    let mut vm = VM::new(instructions);
    vm.dictionary = dictionary;
    vm.run();

    println!("Final stack: {:?}", vm.stack); 
}

You should see the following output:

Push(5)
CallWord("square")
Halt
Dup
Mul
Return
Final stack: [25]

Conclusion

This was a big leap forward: we now parse and run real Forth-like programs, entirely from text.

The parser separates top-level code from definitions, calculates addresses correctly, inserts a Halt, and builds a dictionary of reusable named words.

We now have:

  • A working VM
  • An extensible instruction set
  • Named words and subroutines
  • A parser for Forth-style input

The code for this part can be found up on GitHub.

Building a Stack-Based VM in Rust - Part 4

Introduction

In Part 3, we introduced control flow and subroutines into our virtual machine. That gave us branching logic and reusable code blocks — a huge step forward.

But one core Forth idea is still missing: the ability to define and name new words.

In this part, we’ll add a dictionary to our VM and support calling reusable routines by name. This will allow us to define Forth-style words like:

: square dup * ;
5 square

Let’s get into it.

The Concept of a “Word”

In Forth, a word is any named function — even built-ins like + and * are just words. User-defined words are created using : and ;, and then they behave just like native instructions.

To support this, we need:

  • A dictionary mapping word names to addresses
  • An instruction that can call a word by name
  • A way to define new words at specific locations in the program

Extending the Instruction Set

First, we extend our enum to support calling named words:

enum Instruction {
    Push(i32),
    Add,
    Mul,
    Dup,
    Drop,
    Swap,
    Over,
    Rot,
    Nip,
    Tuck,
    TwoDup,
    TwoDrop,
    TwoSwap,
    Depth,
    Jump(isize),
    IfZero(isize),
    Call(usize),
    CallWord(String),     // new
    Return,
    Halt,
}

The new CallWord(String) variant allows us to write programs that reference named words directly.

Adding a Dictionary

Next, we update our VM structure to store a dictionary:

use std::collections::HashMap;

struct VM {
    stack: Vec<i32>,
    program: Vec<Instruction>,
    ip: usize,
    return_stack: Vec<usize>,
    dictionary: HashMap<String, usize>,          // new
}

And initialize it in VM::new():

impl VM {
    fn new(program: Vec<Instruction>) -> Self {
        Self {
            stack: Vec::new(),
            program,
            ip: 0,
            return_stack: Vec::new(),
            dictionary: HashMap::new(),         // new
        }
    }
}

Adding New Words

We create a helper method to register a word at a specific address:

impl VM {
    fn add_word(&mut self, name: &str, address: usize) {
        self.dictionary.insert(name.to_string(), address);
    }
}

This lets us register any block of code under a name.

Calling Named Words

Now we implement the CallWord instruction in our dispatch loop:

Instruction::CallWord(name) => {
    let addr = self.dictionary.get(name)
        .expect(&format!("Unknown word: {}", name));
    self.return_stack.push(self.ip + 1);
    self.ip = *addr;
    continue;
}

This works just like Call, but performs a dictionary lookup first.

Example: Defining square

Here’s a complete program that defines and calls a square word:

let program = vec![
    Instruction::Push(5),
    Instruction::CallWord("square".to_string()),
    Instruction::Halt,

    // : square dup * ;
    Instruction::Dup,
    Instruction::Mul,
    Instruction::Return,
];

let mut vm = VM::new(program);
vm.add_word("square", 3); // definition starts at index 3
vm.run();

println!("Final stack: {:?}", vm.stack);

Output:

[25]

We’ve now made it possible to extend the language from within the language — a hallmark of Forth.

Optional: Parsing : square dup * ;

Currently we define words manually by inserting them into the dictionary, but in true Forth style we’d like to write:

: square dup * ;
5 square

To support that, we’ll need a minimal parser or macro-assembler to convert high-level Forth code into VM instructions. This will be the focus of a future post.

Conclusion

In this post, we gave our VM the ability to define and call named words, which turns our stack machine into something far more expressive and composable.

Our VM now supports:

  • Arithmetic
  • Stack manipulation
  • Control flow and subroutines
  • A dictionary of named routines

In Part 5, we’ll push even further — implementing a simple parser that can read actual Forth-like text, resolve words, and build programs dynamically.

We’re getting very close to having a minimal, working Forth interpreter — and it’s all built in Rust.

The code for this part is available here on GitHub

Building a Stack-Based VM in Rust - Part 3

Introduction

In Part 2, we extended our Forth-style virtual machine with a bunch of classic stack manipulation words — from OVER and ROT to 2DUP, 2SWAP, and more.

This gave our machine more expressive power, but it still lacked something crucial: control flow. In this part, we’ll fix that.

By adding branching and subroutine support, we allow our VM to make decisions and reuse logic — two foundational ideas in all real programming languages.

Control Flow in Stack Machines

Stack machines like Forth typically handle control flow through explicit instruction manipulation — that is, jumping to new parts of the program and returning when done.

We’ll implement:

Instruction Stack Effect Description
IfZero(offset) ( n -- ) Jumps offset if top is zero
Jump(offset) ( -- ) Always jumps offset
Call(addr) ( -- ) Saves return address and jumps
Return ( -- ) Pops return address and jumps to it

These instructions give us the power to create conditionals and function-like routines.

Extending the Instruction Set

Let’s extend our enum with the new operations:

enum Instruction {
    Push(i32),
    Add,
    Mul,
    Dup,
    Drop,
    Swap,
    Over,
    Rot,
    Nip,
    Tuck,
    TwoDup,
    TwoDrop,
    TwoSwap,
    Depth,
    Jump(isize),     // new
    IfZero(isize),   // new
    Call(usize),     // new
    Return,          // new
    Halt,
}

In order to support our ability to call subroutines, our virtual machine needs another stack. This stack is in charge of remembering where we came from so that we can return back to the correct place. The return stack is just another piece of state management for the virtual machine:

struct VM {
    stack: Vec<i32>,
    program: Vec<Instruction>,
    ip: usize,
    return_stack: Vec<usize>,       // new
}

And make sure VM::new() initializes that new return stack:

impl VM {
    fn new(program: Vec<Instruction>) -> Self {
        Self {
            stack: Vec::new(),
            program,
            ip: 0,
            return_stack: Vec::new(),       // new
        }
    }
}

Implementing Control Instructions

Each control instruction is added to the run() method just like any other:

JUMP

Unconditionally jumps to a new offset from the current instruction pointer.

Stack effect: ( -- )

Instruction::Jump(offset) => {
    self.ip = ((self.ip as isize) + offset) as usize;
    continue;
}

We use continue here because we don’t want to execute the usual ip += 1 after a jump.

IFZERO

Conditionally jumps based on the top stack value.

Stack effect: ( n -- )

Instruction::IfZero(offset) => {
    let cond = self.stack.pop().expect("Stack underflow on IFZERO");
    if cond == 0 {
        self.ip = ((self.ip as isize) + offset) as usize;
        continue;
    }
}

If the value is zero, we adjust ip by the offset. If not, we let the loop continue as normal.

CALL

Pushes the current instruction pointer onto the return stack and jumps to the absolute address.

Stack effect: ( -- )

Instruction::Call(addr) => {
    self.return_stack.push(self.ip + 1);
    self.ip = *addr;
    continue;
}

We store ip + 1 so that Return knows where to go back to.

RETURN

Pops the return stack and jumps to that address.

Stack effect: ( -- )

Instruction::Return => {
    let ret = self.return_stack.pop().expect("Return stack underflow");
    self.ip = ret;
    continue;
}

This makes it possible to write reusable routines, just like functions.

Example: Square a Number

Let’s write a subroutine that squares the top value of the stack — like this:

: square dup * ;
5 square

Translated into VM instructions:

let program = vec![
    // main
    Instruction::Push(5),       // [5]
    Instruction::Call(3),       // jump to square
    Instruction::Halt,

    // square (addr 3)
    Instruction::Dup,           // [5, 5]
    Instruction::Mul,           // [25]
    Instruction::Return,
];

let mut vm = VM::new(program);
vm.run();
println!("Final stack: {:?}", vm.stack);

Expected output:

[25]

If you accidentally used Call(5), you’d be jumping to Return, skipping your routine completely — a classic off-by-one bug that’s easy to spot once you think in terms of instruction addresses.

Conclusion

With these new control flow instructions, we’ve unlocked a huge amount of expressive power. Our VM can now:

  • Execute conditional logic
  • Jump forwards and backwards
  • Encapsulate and reuse stack behavior with subroutines

In the next part, we’ll take the leap into defining named words, allowing us to simulate real Forth syntax like:

: square dup * ;
5 square

We’ll build a dictionary, wire up some simple parsing, and move closer to an interactive REPL.

The code for this part is available here on GitHub.

Building a Stack-Based VM in Rust - Part 2

Introduction

In Part 1, we built the foundation of a Forth-inspired stack-based virtual machine in Rust. It could execute arithmetic expressions using a simple data stack, with support for operations like PUSH, ADD, MUL, and basic stack manipulation like DUP, DROP, and SWAP.

In this post, we’re going to extend our instruction set with a broader set of stack manipulation words, modeled after standard Forth operations.

Why focus on stack operations? Because in a language like Forth, the stack is everything. Understanding and manipulating it precisely is key to building complex programs — without variables, parentheses, or traditional control structures.

Stack Operations in Forth

Let’s take a look at some of the classic stack words used in Forth and what they do:

Word Stack Effect Description
OVER ( a b -- a b a ) Copies the second value to the top
ROT ( a b c -- b c a ) Rotates the third value to the top
NIP ( a b -- b ) Removes the second item
TUCK ( a b -- b a b ) Duplicates the top item under the second
2DUP ( a b -- a b a b ) Duplicates the top two items
2DROP ( a b -- ) Drops the top two items
2SWAP ( a b c d -- c d a b ) Swaps the top two pairs
DEPTH ( -- n ) Pushes the current stack depth

These tiny instructions are the building blocks for everything from loops and conditionals to data structures and control flow. Let’s implement them.

Extending the Instruction Set

First, we add new variants to our Instruction enum:

enum Instruction {
    Push(i32),
    Add,
    Mul,
    Dup,
    Drop,
    Swap,
    Over,
    Rot,
    Nip,
    Tuck,
    TwoDup,
    TwoDrop,
    TwoSwap,
    Depth,
    Halt,
}

Implementing the New Instructions

Each of these stack operations is implemented as a new match arm in our run() method. Here’s the complete method with all new instructions included:

OVER

Copies the second value from the top and pushes it to the top.

Stack effect: ( a b -- a b a )

Instruction::Over => {
    if self.stack.len() < 2 {
        panic!("Stack underflow on OVER");
    }
    let val = self.stack[self.stack.len() - 2];
    self.stack.push(val);
}

This implementation uses indexing to read the second-to-top value without popping. It’s a clean operation that doesn’t disturb the existing stack order — a very common primitive in Forth.

ROT

Rotates the third item to the top of the stack.

Stack effect: ( a b c -- b c a )

Instruction::Rot => {
    if self.stack.len() < 3 {
        panic!("Stack underflow on ROT");
    }
    let c = self.stack.pop().unwrap();
    let b = self.stack.pop().unwrap();
    let a = self.stack.pop().unwrap();
    self.stack.push(b);
    self.stack.push(c);
    self.stack.push(a);
}

We pop all three values, then push them back in rotated order. It’s a destructive operation — it reshuffles the top 3 items completely.

NIP

Removes the second item, leaving the top item alone.

Stack effect: ( a b -- b )

Instruction::Nip => {
    if self.stack.len() < 2 {
        panic!("Stack underflow on NIP");
    }
    let top = self.stack.pop().unwrap();
    self.stack.pop(); // discard second
    self.stack.push(top);
}

Here we temporarily save the top, discard the second, then restore the top. This is essentially “keep the top, ignore the rest.”

TUCK

Duplicates the top item and inserts it beneath the second.

Stack effect: ( a b -- b a b )

Instruction::Tuck => {
    if self.stack.len() < 2 {
        panic!("Stack underflow on TUCK");
    }
    let top = *self.stack.last().unwrap();
    self.stack.insert(self.stack.len() - 2, top);
}

We avoid popping by using last() and insert(). Inserting at len() - 2 puts the copy just beneath the second item, preserving the original order.

2DUP

Duplicates the top two stack items.

Stack effect: ( a b -- a b a b )

Instruction::TwoDup => {
    if self.stack.len() < 2 {
        panic!("Stack underflow on 2DUP");
    }
    let len = self.stack.len();
    self.stack.push(self.stack[len - 2]);
    self.stack.push(self.stack[len - 1]);
}

We peek at the last two items and push duplicates in-place. It’s a straightforward double copy.

2DROP

Removes the top two items from the stack.

Stack effect: ( a b -- )

Instruction::TwoDrop => {
    if self.stack.len() < 2 {
        panic!("Stack underflow on 2DROP");
    }
    self.stack.pop();
    self.stack.pop();
}

Just two pops in a row. Very simple and direct.

2SWAP

Swaps the top two pairs on the stack.

Stack effect: ( a b c d -- c d a b )

Instruction::TwoSwap => {
    if self.stack.len() < 4 {
        panic!("Stack underflow on 2SWAP");
    }
    let d = self.stack.pop().unwrap();
    let c = self.stack.pop().unwrap();
    let b = self.stack.pop().unwrap();
    let a = self.stack.pop().unwrap();
    self.stack.push(c);
    self.stack.push(d);
    self.stack.push(a);
    self.stack.push(b);
}

This is the most complex so far. We destructure two pairs from the stack, then push them back in swapped order.

DEPTH

Pushes the number of elements currently on the stack.

Stack effect: ( -- n )

Instruction::Depth => {
    let depth = self.stack.len() as i32;
    self.stack.push(depth);
}

No stack input required. Just measure and push. Very handy for introspection or debugging.

Example: Forth-ish Stack Dance

Let’s build a small program using some of these new instructions:

let program = vec![
    Instruction::Push(1),
    Instruction::Push(2),
    Instruction::Push(3),
    Instruction::Rot,      // [2, 3, 1]
    Instruction::Over,     // [2, 3, 1, 3]
    Instruction::Add,      // [2, 3, 4]
    Instruction::TwoDup,   // [2, 3, 4, 3, 4]
    Instruction::Swap,     // [2, 3, 4, 4, 3]
    Instruction::TwoDrop,  // [2, 3, 4]
    Instruction::Depth,    // [2, 3, 4, 3]
    Instruction::Halt,
];

let mut vm = VM::new(program);
vm.run();
println!("Final stack: {:?}", vm.stack);

The final stack should look like this:

[2, 3, 4, 3]

That last 3 is the result of DEPTH, reporting how many values were on the stack before it was called.

Conclusion

With just a few additional instructions, our little VM has become much more expressive. We’ve added powerful new tools to inspect, duplicate, and reorder values on the stack — just like a real Forth environment.

This kind of “stack choreography” might feel alien at first, but it’s deeply intuitive once you start thinking in terms of data flow. It’s the perfect foundation for:

  • Building control structures
  • Defining new words
  • Supporting conditionals and loops
  • Creating a REPL

And that’s where we’re headed next.

The code for this part is available up in my github.