Building a Stack-Based VM in Rust - Part 5
25 May 2025Introduction
In Part 4, we introduced named words and a dictionary that allowed our VM to call subroutines by name. But there was still one major gap:
We couldn’t write Forth-like code.
You still had to manually build vec![Instruction::Push(5), ...]
in Rust. That changes now.
In this post, we’ll add a hand-rolled parser that understands simple Forth syntax — including word definitions like
: square dup * ;
— and emits instructions automatically.
By the end of this part, you’ll be able to write:
5 square : square dup * ;
And run it on your virtual machine with no hardcoded addresses or manual instruction building.
The Goal
Our parser will:
- Tokenize simple Forth input
- Track whether we’re inside a
:
definition - Split instructions into
main
anddefinitions
- Insert a
Halt
after top-level code to prevent fall-through - Track the correct addresses for word definitions
- Build a final list of instructions ready to run
Here’s the full updated code.
Parser Support
First of all, we define our Parser
struct — this separates the parsing logic from the VM runtime.
struct Parser {
main: Vec<Instruction>,
definitions: Vec<Instruction>,
dictionary: HashMap<String, usize>,
}
Here’s what each member does:
main
: Top-level code that runs first. This is where calls like5 square
are emitted.definitions
: Instructions belonging to: word ... ;
definitions.dictionary
: A mapping of word names (like"square"
) to their starting address in the final instruction stream.
We initialize the parser with empty sections:
impl Parser {
fn new() -> Self {
Self {
main: Vec::new(),
definitions: Vec::new(),
dictionary: HashMap::new(),
}
}
}
Token Parsing
The heart of the parser is the parse
method. We split the input on whitespace and interpret each token in turn.
fn parse(&mut self, input: &str) {
let mut tokens = input.split_whitespace().peekable();
let mut defining: Option<String> = None;
let mut buffer: Vec<Instruction> = Vec::new();
while let Some(token) = tokens.next() {
match token {
":" => {
// Beginning of a word definition
let name = tokens.next().expect("Expected word name after ':'");
defining = Some(name.to_string());
buffer.clear();
}
";" => {
// End of word definition
if let Some(name) = defining.take() {
buffer.push(Instruction::Return);
let addr = self.main.len() + self.definitions.len() + 1; // +1 for HALT
self.dictionary.insert(name, addr);
self.definitions.extend(buffer.drain(..));
} else {
panic!("Unexpected ';' outside of word definition");
}
}
word => {
// Otherwise, parse an instruction
let instr = if let Ok(n) = word.parse::<i32>() {
Instruction::Push(n)
} else {
match word {
"dup" => Instruction::Dup,
"drop" => Instruction::Drop,
"swap" => Instruction::Swap,
"over" => Instruction::Over,
"+" => Instruction::Add,
"*" => Instruction::Mul,
"depth" => Instruction::Depth,
_ => Instruction::CallWord(word.to_string()),
}
};
// Add to appropriate section
if defining.is_some() {
buffer.push(instr);
} else {
self.main.push(instr);
}
}
}
}
}
Breakdown of the cases:
:
begins a new named word definition.;
ends the definition, emits aReturn
, and stores the word’s starting address in the dictionary.- A number becomes a
Push(n)
instruction. - Built-in words like
+
and*
become directInstruction
variants. - Any unknown token is assumed to be a user-defined word, and gets translated to
CallWord("name")
.
Finalizing the Program
Once parsing is complete, we combine the main
program with definitions — separated by a Halt
to ensure we don’t fall through.
fn finalize(self) -> (Vec<Instruction>, HashMap<String, usize>) {
let mut instructions = self.main;
instructions.push(Instruction::Halt); // Halts after main program
instructions.extend(self.definitions);
(instructions, self.dictionary)
}
Main Program
Our main()
function now uses the parser to construct the program from a Forth-style string.
fn main() {
let mut parser = Parser::new();
parser.parse("5 square : square dup * ;");
let (instructions, dictionary) = parser.finalize();
for instr in &instructions {
println!("{:?}", instr);
}
let mut vm = VM::new(instructions);
vm.dictionary = dictionary;
vm.run();
println!("Final stack: {:?}", vm.stack);
}
You should see the following output:
Push(5)
CallWord("square")
Halt
Dup
Mul
Return
Final stack: [25]
Conclusion
This was a big leap forward: we now parse and run real Forth-like programs, entirely from text.
The parser separates top-level code from definitions, calculates addresses correctly, inserts a Halt
, and builds a
dictionary of reusable named words.
We now have:
- A working VM
- An extensible instruction set
- Named words and subroutines
- A parser for Forth-style input
The code for this part can be found up on GitHub.