Cogs and Levers A blog full of technical stuff

Making a REPL with NASM and glibc

In the previous article we learned something important:

Assembly becomes dramatically more productive the moment you stop rewriting libc.

Printing text, formatting numbers, comparing strings, and handling input are already solved problems — and they’ve been solved extremely well.

Now we’re going to push that idea to its natural conclusion. We are going to write a real interactive program in pure assembly. A program that stays alive, reads commands, parses arguments, and performs actions.

In other words — a REPL.

By the end, this will work:

> help
commands: help add quit

> add 5 7
12

> add 1 2
3

> what
unknown command

> quit
bye

And we still won’t write a single syscall.

The full code listing for this article can be found here. We will be covering this code, piece by piece.

The Shape of the Program

Before writing any code, we need to understand the structure.

A REPL is just a loop:

  1. print a prompt
  2. read a line
  3. decide what it means
  4. run a handler
  5. repeat

There is no magic here. High level languages don’t provide REPLs — they just hide loops.

In assembly, we simply write the loop ourselves.

External Functions

We will use these glibc functions:

  • printf — formatted output
  • getline — dynamic input
  • strcmp — command matching
  • atoi — integer parsing
  • free — memory ownership

Let’s declare them.

BITS 64
DEFAULT REL

extern printf
extern getline
extern strcmp
extern atoi
extern free

global main

Exactly like before, these symbols exist inside glibc and will be resolved at link time.

Static Data

We now define the strings our program will use.

section .rodata
prompt      db "> ", 0
bye_msg     db "bye", 10, 0
unk_msg     db "unknown command", 10, 0
help_msg    db "commands: help add quit", 10, 0
add_fmt     db "%d", 10, 0

cmd_help    db "help", 0
cmd_add     db "add", 0
cmd_quit    db "quit", 0

This is exactly like C string constants — null terminated and stored in read-only memory.

Writable Storage

We now need somewhere to store input state.

getline allocates memory for us, but we must own the pointer.

section .bss
lineptr     resq 1
linesize    resq 1

This is important.

getline does not return a string.

It fills a pointer that we provide.
That pointer may be reallocated between calls.

So we must store it globally.

Program Entry

We now write main.

section .text
main:
  push rbp
  mov  rbp, rsp

We create a normal stack frame. Not strictly required — but keeps debugging sane and mirrors C expectations.

Now we initialise the buffer state.

mov qword [lineptr], 0
mov qword [linesize], 0

This tells getline:

I do not own a buffer yet — please allocate one.

The REPL Loop

Here is the heart of the program.

repl:

A label is all a loop really is.

Printing the Prompt

lea rdi, [rel prompt]
xor eax, eax
call printf wrt ..plt

We load the format string into rdi.

Why xor eax, eax?

Because printf is variadic.
The System V ABI requires rax to contain the number of vector registers used — zero in our case.

C hides this rule. Assembly makes you honest.

Reading a Line

lea rdi, [rel lineptr]
lea rsi, [rel linesize]
mov rdx, [rel stdin]
call getline wrt ..plt

getline signature:

ssize_t getline(char **lineptr, size_t *n, FILE *stream);

So we pass:

register value
rdi pointer to buffer pointer
rsi pointer to size
rdx stdin

This function may:

  • allocate memory
  • grow memory
  • reuse memory

Which means:

We must eventually call free.

Extract Command

We now compare the input against commands.

mov rdi, [lineptr]
lea rsi, [rel cmd_help]
call strcmp wrt ..plt
test eax, eax
je do_help

strcmp returns zero when equal.

So we branch. This is effectively our switch and case.

Unknown Command Fallback

lea rdi, [rel unk_msg]
xor eax, eax
call printf wrt ..plt
jmp repl

This is our default case.

Help Command

do_help:
lea rdi, [rel help_msg]
xor eax, eax
call printf wrt ..plt
jmp repl

No surprises — just structured control flow.

Assembly is not chaotic.
It just doesn’t auto-indent for you.

Quit Command

mov rdi, [lineptr]
lea rsi, [rel cmd_quit]
call strcmp wrt ..plt
test eax, eax
je do_quit
do_quit:
  lea rdi, [rel bye_msg]
  xor eax, eax
  call printf wrt ..plt

  mov rdi, [lineptr]
  call free wrt ..plt

  xor eax, eax
  leave
  ret

Here we finally release memory ownership.

This is the most important rule in the entire article:

If libc allocates it, libc expects you to free it.

Assembly didn’t make this hard — ignoring ownership did.

Add Command (Parsing Arguments)

Now the interesting part.

We skip "add " and parse numbers.

do_add:
mov rbx, [lineptr]
add rbx, 4

We manually advance past "add ".

This is literally what C does internally. Now we process the first number.

mov rdi, rbx
call atoi wrt ..plt
mov r12d, eax

atoi converts text to integer.

We store it in a preserved register. Now we’ll look for the second parameter.

find_space:
cmp byte [rbx], 0
je repl
cmp byte [rbx], ' '
je found_space
inc rbx
jmp find_space

found_space:
inc rbx

We manually walk the string.

This is what string parsing actually is:
a loop and a condition.

mov rdi, rbx
call atoi wrt ..plt
add eax, r12d

Now we have the result.

mov esi, eax
lea rdi, [rel add_fmt]
xor eax, eax
call printf wrt ..plt
jmp repl

And the loop continues.

Building

Same as before.

nasm -felf64 repl.asm -o repl.o
gcc repl.o -o repl

What We Actually Built

We did not implement:

  • input buffering
  • dynamic allocation
  • number parsing
  • formatted output
  • terminal handling

Yet this is undeniably a real interactive program.

The difference between C and assembly is not capability.

It is visibility.

C hides the machine.
Assembly exposes it.
glibc carries the weight in both cases.

Conclusion

Assembly feels impossible when you try to do everything yourself.

But real programs were never written that way — not even in the 1970s.

They were written as small pieces of logic sitting on top of shared libraries.

That’s exactly what we built here.

Getting More Productive With NASM and glibc

Writing “pure syscall” assembly can be fun and educational — right up until you find yourself rewriting strlen, strcmp, line input, formatting, and file handling for the tenth time.

If you’re building tooling (monitors, debuggers, CLIs, experiments), the fastest path is often to write your core logic in assembly and call out to glibc for the boring parts.

In today’s article, we’ll walk through a basic example to get you up and running. You should quickly see just how thin the C language really is as a layer over assembly and the machine itself.

A full version of what we’ll build here today can be found here.

Hello, world

We’ll start with a simple “Hello, world” style application.

BITS 64
DEFAULT REL

extern puts
global main

section .rodata
msg db "Hello from NASM + glibc (puts)!", 0

section .text
main:
  ; puts(const char *s)
  lea   rdi, [rel msg]
  call  puts wrt ..plt      ; <-- PIE-friendly call via PLT

  xor   eax, eax            ; return 0
  ret

Let’s break this down.

BITS 64
DEFAULT REL

First, we tell the assembler that we’re generating code for x86-64 using the BITS directive.

DEFAULT REL changes the default addressing mode in 64-bit assembly from absolute addressing to RIP-relative addressing. This is an important step when writing modern position-independent code (PIC), and allows the resulting executable to work correctly with security features like Address Space Layout Randomisation (ASLR).

extern puts

Functions that are implemented outside our module are resolved at link time. Since the implementation of puts lives inside glibc, we declare it as an external symbol.

global main

The true entry point of a Linux program is _start. When you write a fully standalone binary, you need to define this yourself.

Because we’re linking against glibc, the C runtime provides the startup code for us. Internally, this eventually calls our main function. To make this work, we simply mark main as global so the linker can find it.

section .rodata
msg db "Hello from NASM + glibc (puts)!", 0

Here we define our string in the read-only data section (.rodata). From a C perspective, this is equivalent to storing a const char *.

section .text
main:

This marks the beginning of our executable code and defines the main entry point.

  lea   rdi, [rel msg]
  call  puts wrt ..plt

This is where we actually print the message.

According to the x86-64 System V ABI (used by Linux and glibc), function arguments are passed in registers using the following order:

  • rdi
  • rsi
  • rdx
  • rcx
  • r8
  • r9

Floating-point arguments are passed in XMM registers.

We load the address of our string into rdi, then call puts.

The wrt ..plt modifier tells NASM to generate a call through the Procedure Linkage Table (PLT). This is required for producing position-independent executables (PIE), which are the default on many modern Linux systems. Without this, the linker may fail or produce non-relocatable binaries.

xor   eax, eax
ret

Finally, we return zero from main by clearing eax. Control then returns to glibc, which performs cleanup and exits back to the operating system.

Building

We first assemble the file into an object file:

nasm -felf64 hello.asm -o hello.o

Next, we link it using gcc. This automatically pulls in glibc and the required runtime startup code:

gcc hello.o -o hello

On many modern Linux distributions, position-independent executables are enabled by default. If you encounter relocation errors during linking, you can explicitly enable PIE support:

gcc -fPIE -pie hello.o -o hello

Or temporarily disable it while experimenting:

gcc -no-pie hello.o -o hello

The PLT-based call form shown earlier works correctly in both cases.

Conclusion

Calling glibc from NASM is one of those “unlock” moments.

You retain full control over registers, memory layout, and calling conventions — while gaining access to decades of well-tested functionality for free.

Instead of rewriting basic infrastructure, you can focus your energy on the interesting low-level parts of your project.

For tools like debuggers, monitors, loaders, and CLIs, this hybrid approach often provides the best balance between productivity and control.

In the next article, we’ll build a small interactive REPL in NASM using getline, strcmp, and printf, and start layering real debugger-style functionality on top.

Assembly doesn’t have to be painful — it just needs the right leverage.

Creating extensions in Rust for PostgreSQL

In a previous post I walked through building PostgreSQL extensions in C. It worked, but the process reminded me why systems programming slowly migrated away from raw C for anything larger than a weekend hack. Writing even a trivial function required boilerplate macros, juggling PG_FUNCTION_ARGS, and carefully tiptoeing around memory contexts.

This time, we’re going to do the same thing again — but in Rust.

Using the pgrx framework, you can build fully-native Postgres extensions with:

  • no hand-written SQL wrappers
  • no PGXS Makefiles
  • no manual tuple construction
  • no palloc/pfree memory management
  • a hot-reloading development Postgres
  • and zero unsafe code unless you choose to use it

Let’s walk through the entire process: installing pgrx, creating a project, adding a function, and calling it from Postgres.


1. Installing pgrx

Install the pgrx cargo subcommand:

cargo install --locked cargo-pgrx

Before creating an extension, pgrx needs to know which versions of Postgres you want to target.
Since I’m running PostgreSQL 17, I simply asked pgrx to download and manage its own copy:

cargo pgrx init --pg17 download

This is important.

Instead of installing into /usr/share/postgresql (which requires root and is generally a bad idea), pgrx keeps everything self-contained under:

~/.pgrx/17.x/pgrx-install/

This gives you:

  • a private Postgres 17 instance
  • a writable extension directory
  • zero interference with your system Postgres
  • a smooth, reproducible development environment

2. Creating a New Extension

With pgrx initialised, create a new project:

cargo pgrx new hello_rustpg
cd hello_rustpg

This generates a full extension layout:

Cargo.toml
src/lib.rs
sql/hello_rustpg.sql
hello_rustpg.control

When you compile the project, pgrx automatically generates SQL wrappers and installs everything into its own Postgres instance.


3. A Minimal Rust Function

Open src/lib.rs and add:

use pgrx::prelude::*;

pgrx::pg_module_magic!();

#[pg_extern]
fn hello_rustpg() -> &'static str {
    "Hello from Rust + pgrx on Postgres 17!"
}

That’s all you need.
pgrx generates the SQL wrapper for you, handles type mapping, and wires everything into Postgres.


4. Running It Inside Postgres

Start your pgrx-managed Postgres 17 instance:

cargo pgrx run pg17

Inside psql:

CREATE EXTENSION hello_rustpg;
SELECT hello_rustpg();

Result:

 hello_rustpg            
-------------------------------
 Hello from Rust + pgrx on Postgres 17!
(1 row)

Done. A working native extension — no Makefiles, no C, no segfaults.


5. Returning a Table From Rust

Let’s do something a little more interesting: return rows.

Replace your src/lib.rs with:

use pgrx::prelude::*;
use pgrx::spi::SpiResult;

pgrx::pg_module_magic!(name, version);

#[pg_extern]
fn hello_hello_rustpg() -> &'static str {
    "Hello, hello_rustpg"
}

#[pg_extern]
fn list_tables() -> TableIterator<'static, (name!(schema, String), name!(table, String))> {
    let sql = "
        SELECT schemaname::text AS schemaname,
               tablename::text AS tablename
        FROM pg_tables
        WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
        ORDER BY schemaname, tablename;
    ";

    let rows = Spi::connect(|client| {
        client
            .select(sql, None, &[])?
            .map(|row| -> SpiResult<(String, String)> {
                let schema: Option<String> = row["schemaname"].value()?;
                let table: Option<String>  = row["tablename"].value()?;

                Ok((schema.expect("schemaname null"),
                    table.expect("tablename null")))
            })
            .collect::<SpiResult<Vec<_>>>()
    })
    .expect("SPI failed");

    TableIterator::new(rows.into_iter())
}

Re-run:

cargo pgrx run pg17

Then:

SELECT * FROM list_tables();

If you don’t have any tables, your list will be empty. Otherwise you’ll see something like:

 schema |    table    
--------+-------------
 public | names
 public | order_items
 public | orders
 public | users
(4 rows)

This is the point where Rust starts to feel like cheating:
you’re returning tuples without touching TupleDesc, heap_form_tuple(), or any of Postgres’s internal APIs.


6. Accessing Catalog Metadata (Optional but Fun)

Here’s one more example: listing foreign keys.

#[pg_extern]
fn list_foreign_keys() -> TableIterator<
    'static,
    (
        name!(table_name, String),
        name!(column_name, String),
        name!(foreign_table_name, String),
        name!(foreign_column_name, String),
    ),
> {
    let sql = r#"
        SELECT
            tc.table_name::text        AS table_name,
            kcu.column_name::text      AS column_name,
            ccu.table_name::text       AS foreign_table_name,
            ccu.column_name::text      AS foreign_column_name
        FROM information_schema.table_constraints AS tc
        JOIN information_schema.key_column_usage AS kcu
            ON tc.constraint_name = kcu.constraint_name
           AND tc.table_schema = kcu.table_schema
        JOIN information_schema.constraint_column_usage AS ccu
            ON ccu.constraint_name = tc.constraint_name
           AND ccu.table_schema = tc.table_schema
        WHERE tc.constraint_type = 'FOREIGN KEY'
        ORDER BY tc.table_name, kcu.column_name;
    "#;

    let rows = Spi::connect(|client| {
        client
            .select(sql, None, &[])?
            .map(|row| -> SpiResult<(String, String, String, String)> {
                let t:  Option<String> = row["table_name"].value()?;
                let c:  Option<String> = row["column_name"].value()?;
                let ft: Option<String> = row["foreign_table_name"].value()?;
                let fc: Option<String> = row["foreign_column_name"].value()?;

                Ok((t.expect("null"), c.expect("null"), ft.expect("null"), fc.expect("null")))
            })
            .collect::<SpiResult<Vec<_>>>()
    })
    .expect("SPI failed");

    TableIterator::new(rows.into_iter())
}

In psql:

SELECT * FROM list_foreign_keys();

Example output:

 table_name  | column_name | foreign_table_name | foreign_column_name 
-------------+-------------+--------------------+---------------------
 order_items | order_id    | orders             | id
 orders      | user_id     | users              | id
(2 rows)

This begins to show how easy it is to build introspection tools — or even something more adventurous, like treating your relational schema as a graph.


7. Testing in Rust

pgrx includes a brilliant test harness.

Add this:

#[cfg(any(test, feature = "pg_test"))]
#[pg_schema]
mod tests {
    use super::*;
    use pgrx::prelude::*;

    #[pg_test]
    fn test_hello_rustpg() {
        assert_eq!(hello_rustpg(), "Hello from Rust + pgrx on Postgres 17!");
    }
}

/// Required by `cargo pgrx test`
#[cfg(test)]
pub mod pg_test {
    pub fn setup(_opts: Vec<&str>) {}
    pub fn postgresql_conf_options() -> Vec<&'static str> { vec![] }
}

Then run:

cargo pgrx test pg17

These are real Postgres-backed tests.
It’s one of the biggest advantages of building extensions in Rust.


Conclusion

After building extensions in both C and Rust, I’m firmly in the Rust + pgrx camp.

You still get:

  • full access to Postgres internals
  • native performance
  • the ability to drop into unsafe when needed

But you also get:

  • safety
  • ergonomics
  • powerful testing
  • a private Postgres instance during development
  • drastically simpler code

In the next article I’ll push further and treat foreign keys as edges — effectively turning a relational schema into a graph.

But for now, this is a clean foundation:
a native PostgreSQL extension written in Rust, tested, and running on Postgres 17.

Loading dynamic libraries in Rust

Today’s post is going to be a quick demonstration of loading dynamic libraries at runtime in Rust.

In my earlier article, I showed how to use Glibc’s dlopen/dlsym/dlclose APIs from C to load a shared object off disk and call a function in it. Rust can do the same thing – with a bit more type safety – using:

This is not meant to be a full plugin framework, just a minimal “host loads a tiny library and calls one function” example, similar in spirit to the original C version.

A tiny library in Rust

We’ll start with a tiny dynamic library that exports one function, greet, which returns a C-style string:

cargo new --lib rust_greeter
cd rust_greeter

Edit Cargo.toml so that the library is built as a cdylib:

[package]
name = "rust_greeter"
version = "0.1.0"
edition = "2021"

[lib]
name = "test"                
crate-type = ["cdylib"]      

Now the library code in src/lib.rs:

use std::os::raw::c_char;

#[unsafe(no_mangle)]
pub extern "C" fn greet() -> *const c_char {
    static GREETING: &str = "Hello from Rust!\0";
    GREETING.as_ptr().cast()
}

The #[unsafe(no_mangle)] form marks the item (the function) as unsafe to call, and also forwards the nested no_mangle attribute exactly as written. This avoids needing unsafe fn syntax and keeps ABI-exported functions more visually consistent. It’s a small but nice modernisation that fits well when exposing C-compatible symbols from Rust.

Build:

cargo build --release

You’ll get:

target/release/libtest.so

Host program: loading the library with libloading

Create a new binary crate:

cargo new rust_host
cd rust_host

Add libloading to Cargo.toml:

[package]
name = "rust_host"
version = "0.1.0"
edition = "2021"

[dependencies]
libloading = "0.8"

And src/main.rs:

use std::error::Error;
use std::ffi::CStr;
use std::os::raw::c_char;

use libloading::{Library, Symbol};

type GreetFn = unsafe extern "C" fn() -> *const c_char;

fn main() -> Result<(), Box<dyn Error>> {
    unsafe {
        let lib = Library::new("./libtest.so")?;
        let greet: Symbol<GreetFn> = lib.get(b"greet\0")?;

        let raw = greet();
        let c_str = CStr::from_ptr(raw);
        let message = c_str.to_str()?;

        println!("{message}");
    }
    Ok(())
}

Before we can run any of this, we need to make sure the library is available to the host program. In order to do this, we simply copy over the library:

cp ../rust_greeter/target/release/libtest.so .

Just copy the so over to the host program folder.

Running cargo run prints:

$ cargo run                                     
   Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
    Running `target/debug/rust_host`
Hello from Rust!

Mapping back to the C version

When you look at this code, you can see that Library::new("./libtest.so") now takes the place of dlopen().

We can get to the symbol that we want to call with lib.get(b"greet\0") rather than dlsym(), and we clean everything up now by just dropping the library.

Platform notes

Keep in mind that I’ve written this code on my linux machine, so you’ll have different targets depending on the platform that you work from.

Platform Output
Linux libtest.so
macOS libtest.dylib
Windows test.dll

cdylib produces the correct format automatically.

Conclusion

We:

  • built a tiny Rust cdylib exporting a C-ABI function,
  • loaded it at runtime with libloading,
  • looked up a symbol by name, and
  • invoked it through a typed function pointer.

I guess this was just a modern update to an existing article.

Just like in the C post, this is a deliberately minimal skeleton — but enough to grow into a proper plugin architecture once you define a stable API between host and library.

Simulating a Scheduler

Introduction

Most operating systems books talk about schedulers as if they’re mysterious forces that magically decide which program runs next. You’ll usually see phrases like context switching, round-robin, priority queues, and tasks yielding the CPU—but the moment you try to understand how that works mechanically, the examples jump straight into kernel code, hardware timers, and interrupt controllers.

That’s… a lot.

Before touching real hardware, it’s far easier to learn the concept in a sandbox: a tiny world that behaves like an operating system, but without all the baggage. That’s what we’re going to build.

In this post, we’ll simulate a scheduler.

We’ll:

  • Define a small set of opcodes—like a micro-instruction set for “programs.”
  • Write a mini assembler that converts human-readable instructions to bytecodes.
  • Represent each program as a task inside a virtual machine.
  • Implement a scheduler that runs these tasks, switching between them to simulate concurrency.

The final result:

  • multiple tiny “programs”
  • all appearing to run at the same time
  • under the control of a scheduler you wrote

No threads. No OS dependencies. No unsafe code. Just pure, understandable logic.

By the time we reach the end, you’ll not only understand what a scheduler does—you’ll understand why it needs to exist, and the mental model will stay with you when you look at real operating systems later.

Let’s build a scheduler from scratch.

Programs

Before scheduling, we need programs. We need a way to define, write, and store programs so that we can load these into our computer.

Instruction Set

First up, we’re going to define a set of instructions that our “computer” will execute. We’ll define these as a well known common set of instructions as there’s a few components that will need to understand these. The virtual machine that we’ll get to will use these to interpret and perform action, but our assembler will have the job of converting those instructions into bytes on disk.

Here’s the instructions:

#[repr(u8)]
#[derive(Copy, Clone, Debug)]
pub enum Instruction {
    NOP = 0x00,
    PUSH = 0x01,
    LOAD = 0x02,
    ADD = 0x03,
    SUB = 0x04,
    PRINT = 0x05,
    SLEEP = 0x06,
    YIELD = 0x07,
    HALT = 0x08,
    WORK = 0x09,
    SETPRIO = 0x0A,
}

As you can see, our computer won’t do a lot. Its functionality isn’t the subject here though; we want to see how these instructions get scheduled.

We use repr(u8) to make casting to u8 a much smoother process; that will help our assembler later on change our defined instructions into programs on disk.

Conversely, we need to make the bytes-on-disk to instructions-in-memory translation easier.

impl Instruction {
    pub fn from_byte(byte: u8) -> Option<Self> {
        match byte {
            0x00 => Some(Instruction::NOP),
            0x01 => Some(Instruction::PUSH),
            0x02 => Some(Instruction::LOAD),
            0x03 => Some(Instruction::ADD),
            0x04 => Some(Instruction::SUB),
            0x05 => Some(Instruction::PRINT),
            0x06 => Some(Instruction::SLEEP),
            0x07 => Some(Instruction::YIELD),
            0x08 => Some(Instruction::HALT),
            0x09 => Some(Instruction::WORK),
            0x0A => Some(Instruction::SETPRIO),
            _ => None,
        }
    }
}

Assembling Programs

With a solid set of instructions defined, we can now write some code to take a list of instrucitons and write them to disk. We’d call this a binary.

pub fn translate_instructions_to_bytes(instructions: &Program) -> Vec<u8> {
    instructions
        .iter()
        .map(|i| *i as u8)
        .collect::<Vec<u8>>()
}

pub fn assemble_to_disk(instructions: &Program, path: &str) -> std::io::Result<()> {
    let bytes = translate_instructions_to_bytes(instructions);
    std::fs::write(path, bytes)
}

The Program type is simply a synonym for Vec<Instruction>.

We can use this now to assemble some programs:

let p1 = vec![
    Instruction::PUSH,
    Instruction::ADD,
    Instruction::SUB,
    Instruction::LOAD,
    Instruction::HALT,
];

assemble_to_disk(&p1, "./bins/p1.bin").expect("couldn't write binary");

Outside the context of our rust program, we can verify the content of the binary with hexdump:

$ hexdump -C bins/p1.bin

00000000  01 03 04 02 08       |.....|

Those byte values marry up to what our program defines. You can see it’s important that our ISA doesn’t change now, otherwise our programs will have completely different meanings!

Of course, if it does change - we simply re-assemble our programs.

Tasks

Now we define our Task abstraction. A task is how our virtual machine will store the state of a program that’s been loaded off of disk and is executing / has executed. We only need some basics to define the state of our task.

pub struct Task {
    /// The path to the task
    pub path: String,
    
    /// Program code
    pub code: Program,
    
    /// Current program counter (instruction pointer register)
    pub pc: usize,
    
    /// Stack state
    pub stack: Vec<i32>,
    
    /// Determines if this task is still running
    pub done: bool,
    
    /// Priority of this task
    pub priority: usize,
}

We can load programs off of disk and make tasks out of them ready to execute.

impl Task {
    pub fn load(path: &str) -> std::io::Result<Task> {
        let raw_code = std::fs::read(path).expect("Failed to read file");
        let instructions = raw_code
            .iter()
            .filter_map(|&b| Instruction::from_byte(b)) /* filter bytes out that are invalid */
            .collect::<Vec<Instruction>>();
        
        let task = Task {
            path: path.to_string(),
            code: instructions,
            pc: 0,
            stack: [].to_vec(),
            done: false,
            priority: 0
        };

        return Ok(task);
    }
}

Virtual Machine

Now that we have tasks loaded from disk, we need something that can run them.

Enter the virtual machine.

This is the component responsible for:

  • tracking and storing tasks,
  • executing one instruction at a time,
  • and cooperatively multitasking between tasks (our scheduler).

Let’s look at the core VM structure:

pub struct VM {
    pub tasks: Vec<Task>,
}

impl VM {
    pub fn new() -> Self {
        Self { tasks: Vec::new() }
    }

    pub fn add_task(&mut self, task: Task) {
        self.tasks.push(task);
    }

    fn has_runnables(&self) -> bool {
        return self.tasks.len() > 0 && self.tasks.iter().any(|t| !t.done);
    }

    fn execute(&self, task: &Task) {
        let instruction = task.code[task.pc];
        println!("[{}]: {:?}", task.path, instruction);
    }
}

The virtual machine stores a collection of Tasks, and has three responsibilities:

  1. add_task – load a task into the VM.
  2. has_runnables – check if any task still needs CPU time.
  3. execute – run one instruction for a given task.

Note: execute receives a reference to a task (&Task). We don’t take ownership of the task, because the scheduler must be able to revisit it later and resume execution.

Scheduling — Round Robin

Now that we have tasks stored in the VM, we need to schedule them.

We’re going to implement the simplest scheduling algorithm: round robin.

Round robin is:

  • fair (every task gets a turn),
  • predictable (runs tasks in order),
  • and conceptually simple (loop through the task list over and over).

Here is the scheduler loop:

pub fn run_round_robin(&mut self) {
    while self.has_runnables() {
        // loop each task once per round
        for idx in (0..self.tasks.len()) {
            {
                let task_ref: &Task = &self.tasks[idx];

                if task_ref.done {
                    continue;
                }

                // print the instruction that's about to execute
                self.execute(task_ref);
            }

            // now we advance the task forward one instruction
            let task = &mut self.tasks[idx];

            if !task.done {
                task.step();

                if task.pc >= task.code.len() {
                    task.done = true;
                }
            }
        }

        // cleanup (remove finished tasks from the VM)
        self.tasks.retain(|t| !t.done);
    }
}

A few important points:

  • Only one instruction executes per task per loop — simulating time slices.
  • Each iteration of the outer while loop represents one round of CPU scheduling.
  • retain removes finished tasks so the VM doesn’t waste time checking them.

That’s cooperative multitasking: the VM does just enough work per task, then moves on.

Test Programs

We need some test programs to feed into this system. Here’s the code (using our assembler) that creates three programs that we can use called p1, p2, and p3.

let p1 = vec![
    Instruction::PUSH,
    Instruction::ADD,
    Instruction::SUB,
    Instruction::LOAD,
    Instruction::HALT,
];

let p2 = vec![
    Instruction::SLEEP,
    Instruction::SLEEP,
    Instruction::SLEEP,
    Instruction::NOP,
    Instruction::HALT,
];

let p3 = vec![
    Instruction::PUSH,
    Instruction::LOAD,
    Instruction::HALT,
];

assemble_to_disk(&p1, "./bins/p1.bin").expect("couldn't write binary");
assemble_to_disk(&p2, "./bins/p2.bin").expect("couldn't write binary");
assemble_to_disk(&p3, "./bins/p3.bin").expect("couldn't write binary");

With these programs, you’ll now be able to see how they “animate” through the scheduler as it chooses which to execute.

Running It

Let’s put it all together.

We’ll load three programs — each assembled into a .bin file — and execute them through our VM:

let mut vm = VM::new();

vm.add_task(Task::load("./bins/p1.bin").expect("couldn't read binary"));
vm.add_task(Task::load("./bins/p2.bin").expect("couldn't read binary"));
vm.add_task(Task::load("./bins/p3.bin").expect("couldn't read binary"));

vm.run_round_robin();

Running this produces output like the following:

[./bins/p1.bin]: PUSH
[./bins/p2.bin]: SLEEP
[./bins/p3.bin]: PUSH
[./bins/p1.bin]: ADD
[./bins/p2.bin]: SLEEP
[./bins/p3.bin]: LOAD
[./bins/p1.bin]: SUB
[./bins/p2.bin]: SLEEP
[./bins/p3.bin]: HALT
[./bins/p1.bin]: LOAD
[./bins/p2.bin]: NOP
[./bins/p1.bin]: HALT
[./bins/p2.bin]: HALT

Each task gets one instruction at a time.
They appear to multitask, but really, we are just interleaving execution.

This is exactly how early cooperative schedulers worked.

Different Algorithms

We implemented Round Robin and looked at Priority Scheduling, but operating systems use a variety of scheduling strategies. Each one optimizes something different — fairness, throughput, responsiveness, or predictability.

Here’s a breakdown of the most common ones you’ll see in real operating systems:

First-Come, First-Served (FCFS)

The simplest possible scheduler.

  • Tasks are run in the order they arrive.
  • No preemption — once a task starts running, it keeps the CPU until completion.

Pros:

  • Very predictable.
  • Easy to implement.

Cons:

  • Terrible response time — one long task can block all others (the “convoy effect”).

Used in: batch systems, print queues, embedded devices.

Shortest Job First (SJF) / Shortest Remaining Time First (SRTF)

Runs the shortest task first.

  • SJF — non-preemptive (once a task starts, it finishes).
  • SRTF — preemptive (if a new shorter task arrives, preempt the current one).

Pros:

  • Great throughput (lowest total completion time).

Cons:

  • Requires knowing how long tasks will run (hard in general).
  • Small tasks can starve large tasks.

Used in: job schedulers, long-running batch systems, HPC.

Round Robin (RR) (what we implemented)

Each task gets a fixed unit of time (called a quantum), and then moves to the back of the queue.

Pros:

  • Very fair.
  • Great for interactive workloads (UI responsiveness).

Cons:

  • If quantum is too short: too much overhead (context switching).
  • If quantum too long: behaves like FCFS.

Used in: timesharing OS kernels, early Unix schedulers.

Priority Scheduling

Each task has a priority number.

  • Always selects the runnable task with the highest priority.
  • Can be preemptive or non-preemptive.

Pros:

  • High-importance tasks get CPU time first.

Cons:

  • Starvation — low priority tasks may never run.

Used in: realtime systems, audio/video processing, embedded control software.

Multi-Level Feedback Queue (MLFQ) (how modern OS schedulers work)

Combines priority and round robin.

  • Multiple queues (high priority to low priority)
  • Round Robin within each queue
  • Tasks that use a lot of CPU get demoted to lower priority queues
  • Tasks that frequently yield or sleep get promoted (interactive = fast)

Pros:

  • Gives priority to interactive tasks
  • Penalizes CPU-bound tasks
  • Adapts automatically over time (no tuning per process)

Cons:

  • Harder to implement
  • Requires tracking task behavior (history)

Used in: Windows, macOS, Linux (CFS is conceptually similar).

Comparison

Algorithm Preemptive Goal / Optimization Fairness Weaknesses
FCFS N Simplicity Low One long task blocks everything
SJF / SRTF Y/N Lowest total completion time Low Starvation of long tasks
Round Robin Y Responsiveness / interactivity High Requires good quantum tuning
Priority Scheduling Y/N Importance / latency sensitivity Low Starvation of low priorities
Multi-Level Feedback Queue Y Realistic, adaptive fairness High More complex to implement

TL;DR

Different schedulers optimize for different outcomes:

  • Fairness? → Round Robin
  • Highest priority first? → Priority Scheduling
  • Best throughput? → Shortest Job First / SRTF
  • Real OS behavior? → Multi-Level Feedback Queue

Schedulers are tradeoffs — once you understand what they optimize, you understand why real operating systems don’t use just one mechanism.

Conclusion

We started with nothing but a tiny instruction set and finished with:

  • a program format,
  • an assembler,
  • a task abstraction,
  • a virtual machine,
  • and a functioning scheduler.

The magic was never “multitasking” — it was switching to the next task at the right moment.

Schedulers are simple at their heart:

Run something for a bit. Save its state. Move on.

Now that we’ve built a round-robin scheduler, it’s easy to extend:

  • task priorities (SETPRIO),
  • YIELD instruction support,
  • blocking + sleeping tasks,
  • time-based preemption (tick interrupts).

But those are chapters for another day.