Cogs and Levers A blog full of technical stuff

Build your own x86 Kernel Part 2

Introduction

In our previous post we successfully setup our development and build environment, booting a very basic boot loader in QEMU.

In today’s post, we’re going to add some serial integration so that we can see some “signs of life” when our boot loader runs.

Stack Setup

Before moving on any further, it’ll be a good move for us to setup a temporary stack. It’ll live for as long as our boot loader lives. We know our boot code is loaded at 0x7C00. The stack grows downward, so we place it below the boot sector at 0x7000. This keeps it out of the way of our code/data and gives us space to work with. We disable interrupts while changing SS:SP (an interrupt during this window would push onto an uninitialized stack), then re-enable them once the stack and data segments are valid.

main:
cli                   ; disable interrupts

xor   ax, ax
mov   ss, ax
mov   sp, 0x7000      ; stack at 0000:7000 (grows downward)

mov   ds, ax          ; DS = 0 so [label] addresses resolve correctly
mov   es, ax          ; ES = 0

cld                   ; string ops auto-increment

sti                   ; re-enable interrupts

We make sure we can’t be interrupted while doing this, so we clear the interrupt flag with cli. Next, set up the stack so that SS:SP points to 0000:7000. Making ds and es point to the same segment as our code 0000 simplifies things for us. cld makes sure that our lods and stos operations always count ascending. Finally, we re-enable interrupts.

It’ll look something like this:

graph TB A0["0x0000 — 0x03FF
IVT (Interrupt Vector Table)"] A1["0x0400 — ~0x04FF
BDA (BIOS Data Area)"] A2["..."] S["0x7000 (SS:SP start)
Stack top → grows downward"] GAP["0x7000 — 0x7BFF
Gap (free space)"] BOOT["0x7C00 — 0x7DFF
Boot sector (512 bytes)"] A3["... up to conventional memory"] A0 --> A1 --> A2 --> S --> GAP --> BOOT --> A3

Serial

UART (the serial port) is the early debugging channel that we’ll use. It’s the standard for debugging x86 and embedded work, so it’s perfect for what we’re doing.

Registers

We write-to and read-from the UART registers to setup communications over serial. Here’s each of the registers.

COM1 is defined at a base-address of 0x3F8. The following map is an offset from that base.
Offset Description
+0 THR/RBR or DLL (when DLAB=1) — Transmit Holding / Receive Buffer / Divisor Latch Low
+1 IER or DLM (when DLAB=1) — Interrupt Enable / Divisor Latch High
+2 IIR/FCR — Interrupt ID / FIFO Control
+3 LCR — Line Control (word length, parity, stop, DLAB)
+4 MCR — Modem Control (DTR, RTS, OUT1, OUT2, LOOP)
+5 LSR — Line Status (TX empty, RX ready, etc.)
+6 MSR — Modem Status
+7 SCR — Scratch

Init

We can now walk through the init code for the serial line.

%define COM1 0x3F8

serial_init:
  ; 1) Disable UART-generated interrupts (clear IER)
  mov   dx, COM1 + 1
  xor   ax, ax            ; AL=0
  out   dx, al            ; IER = 0 (no UART IRQs)

  ; 2) Enable DLAB so we can set the baud rate divisor
  mov   dx, COM1 + 3
  mov   al, 0x80          ; LCR: DLAB=1
  out   dx, al

  ; 3) Set divisor to 1 -> 115200 baud (on standard PC clock)
  mov   dx, COM1 + 0
  mov   al, 0x01          ; DLL = 1
  out   dx, al
  mov   dx, COM1 + 1
  xor   al, al            ; DLM = 0
  out   dx, al

  ; 4) 8 data bits, no parity, 1 stop bit; clear DLAB to use data regs
  mov   dx, COM1 + 3
  mov   al, 0x03          ; LCR: 8N1 (DLAB=0)
  out   dx, al

  ; 5) Enable FIFO, clear RX/TX FIFOs, set 14-byte RX threshold
  mov   dx, COM1 + 2
  mov   al, 0xC7          ; FCR: 1100_0111b
  out   dx, al

  ; 6) Modem Control: assert DTR, RTS, and OUT2
  mov   dx, COM1 + 4
  mov   al, 0x0B          ; MCR: DTR|RTS|OUT2
  out   dx, al

  ret

Each of these steps is needed in the init phase:

  • IER=0 (no UART IRQs): We’re going to use polling (check LSR bits) in early boot, so we explicitly disable UART interrupts.
  • DLAB=1, set divisor: Standard PC UART clock (1.8432 MHz / 16 = 115200). A divisor of 1 yields 115200 baud. Later you can choose 2 (57600), 12 (9600), etc.
  • LCR=0x03 (8N1): The classic “8 data bits, No parity, 1 stop.” Clearing DLAB returns access to THR/RBR/IER instead of the divisor latches.
  • FCR=0xC7: Enables the FIFO, clears both FIFOs, and sets the RX trigger level to 14 bytes. (On 8250/16450 parts without FIFOs this is ignored—harmless.)
  • MCR=0x0B: Asserts DTR and RTS so the other side knows we’re ready; sets OUT2, which on PCs typically gates the UART interrupt line (even if we aren’t using IRQs yet, OUT2 is commonly left on).

Waiting

Because working with UART is asynchronous, we need to wait for the transmitter holding register is ready. So this waits for the THR empty (bit 5).

serial_wait_tx:
  push  ax
  push  dx

  mov   dx, COM1 + 5
.wait:
  in    al, dx
  test  al, 0x20
  jz    .wait
  
  pop   dx
  pop   ax
  ret

This is important when trying to write data out.

We must wait until we’re ready.

putc and puts

We can now use serial_wait_tx before going to send a character.

The character that we put into al gets sent directly to our COM1 which is at 0x3F8.

serial_putc:
  push  dx

  call  serial_wait_tx

  mov   dx, COM1
  out   dx, al

  pop   dx
  ret

Finally, we can use putc in a loop to implement a puts which iterates though the string at si, sending each of the chars to serial_putc.

serial_puts:
  push  ax
  push  si

.next:
  lodsb
  test  al, al
  jz    .done

  call  serial_putc
  jmp   .next

.done:
  pop   si
  pop   ax

  ret

Signs of life

Now that we have a way to integrate with the serial line, we can use it to prove signs of life in our bootloader.

After our stack is setup, we can start using these functions.

  call  serial_init         ; initialize serial

  mov   si, msg_alive       ; si = our string to print
  call  serial_puts         ; print the string

  hlt                       ; halt

.halt:
  jmp   .halt

%include "boot/serial.asm"  ; serial functions

msg_alive db "Serial communications are alive!", 0

times 510-($-$$) db 0
dw 0AA55h

To clean up the boot code, I tucked all of the serial communication code away into an asm file of its own. It’s still assembled as part of the boot.asm as it’s just text included.

Setting this up for a run, you should see a message in your console.

➜  make run
qemu-system-x86_64 -drive file=os.img,format=raw -serial stdio -debugcon file:debug.log -global isa-debugcon.iobase=0xe9 -display none -no-reboot -no-shutdown -d guest_errors,cpu_reset -D qemu.log
Serial communications are alive!

We are alive!

Conclusion

We have put ourselves in a very strong position with these recent additions. This is an invaluable debugging and diagnostic tool being able to write breadcrumbs into the console to check where execution has made it to.

We’ll continue to build on this when we return for Stage2.

Build your own x86 Kernel Part 1

Introduction

One of the best ways to learn how computers work is to get as close to the hardware as possible. Writing assembly language with no other tools or libraries really helps you to understand exactly what makes them tick. I’m building this article series to walk through the full setup of an x86 system to go from power on to a minimal running operating system.

I’ll gradually build this from the ground up, introducing concepts as we go through these articles.

Today, we’ll get all the tooling and build environment setup so we can develop comfortably.

Tools

Before we begin, we need some tools installed.

  • QEMU for virtualising the system that will run our operating system
  • NASM to be our assembler
  • Make to manage our build chain

Get these installed on your respective system, and we can get started getting the project directory setup.

Project Setup

First up, let’s create our project directory and get our Makefile and bootloader started.

mkdir byo_os
mkdir byo_os/boot

cd byo_os

Boot loader

A boot loader is the very first piece of software that runs when a computer starts. Its job is to prepare the CPU and memory so that a full operating system can take over. When the machine powers on, the BIOS (or UEFI) firmware looks for a bootable program and transfers control to it.

In this tutorial we’re building a BIOS-style boot loader. When a machine boots in legacy BIOS mode, the firmware reads the first 512 bytes of the boot device — called the boot sector — into memory at address 0x7C00 and jumps there. Those 512 bytes must end with the magic signature 0xAA55, which tells the BIOS that this sector is bootable. From that point, our code is executing directly on the CPU in 16-bit real mode, with no operating system or filesystem support at all.

Modern systems use UEFI, which is the successor to BIOS. UEFI firmware looks for a structured executable (a PE/COFF file) stored on a FAT partition and provides a much richer environment — including APIs for disk I/O, graphics, and memory services. It’s powerful, but it’s also more complex and hides many of the low-level details we want to understand.

Starting with BIOS keeps things simple: one sector, one jump, and full control. Once we’ve built a working real-mode boot loader and kernel, it’ll be easy to explore a UEFI variant later because the CPU initialization concepts remain the same — only the firmware interface changes.

Here is our first boot loader.

; ./boot/boot.asm

ORG 0x7C00                  ; our code starts at 0x7C00
BITS 16                     ; we're in 16-bit real mode

main:
  cli                       ; no interrupts
  hlt                       ; stop the processor

.halt:
  jmp   .halt

times 510-($-$$) db 0       ; pad out to 510 bytes
dw 0AA55h                   ; 2 byte signature

Our boot loader must be 512 bytes. We ensure that it is with times 510-($-$$) db 0. This directive pads our boot loader out to 510 bytes, leaving space for the final 2 signature bytes dw 0AA55h which all boot loaders must finish with.

Building

With this code written, we need to be able to build and run it. Using a Makefile is an easy way to wrap up all of these actions so we don’t need to remember all of the build steps.

CC64 = x86_64-elf-gcc
LD64 = x86_64-elf-ld
OBJCOPY = x86_64-elf-objcopy
NASM = nasm

OBJDIR = build

all: os.img

$(OBJDIR):
	mkdir -p $(OBJDIR)

boot/boot.bin: boot/boot.asm
	$(NASM) -f bin $< -o $@

os.img: boot/boot.bin 
	rm -f $@
	dd if=boot/boot.bin of=$@ bs=512 count=1 conv=notrunc
	truncate -s $$((32*512)) $@

run: os.img
	qemu-system-x86_64 -drive file=os.img,format=raw -serial stdio -debugcon file:debug.log -global isa-debugcon.iobase=0xe9 -display none -no-reboot -no-shutdown -d guest_errors,cpu_reset -D qemu.log

clean:
	rm -rf build os.img boot/*.bin *.log

This will build a boot/boot.bin for us, and it will also pack it into an os.img which we will use to run our os.

The key lines in making the os image are the dd and truncate. They get our 512 byte boot sector first in the image, and then the truncate extends the image to 32 sectors (16 KB total) by padding it with zeros. The extra space simulates a small disk, leaving room for later stages like a kernel or filesystem. The first 512 bytes remain our boot sector; the rest is just blank space the BIOS ignores for now.

When we issue make run, this turns into this:

qemu-system-x86_64 -drive file=os.img,format=raw \
                   -serial stdio \
                   -debugcon file:debug.log \
                   -global isa-debugcon.iobase=0xe9 \
                   -display none \
                   -no-reboot \
                   -no-shutdown \
                   -d guest_errors,cpu_reset \
                   -D qemu.log
  • -drive file=os.img,format=raw Attach a raw disk image as the primary drive. When QEMU boots in BIOS mode, it loads the first sector (the MBR) if it ends with the signature 0xAA55.
  • -serial stdio redirect the guest’s COM1 serial port (I/O 0x3F8) to this terminal’s stdin/stdout, so any serial output from the guest appears in your console.
  • -debugcon file:debug.log will dump the debug console into a file called debug.log
  • -global isa-debugcon.iobase=0xe9 Map QEMU’s simple debug console to I/O port 0xE9. Any out 0xE9, al from your code is appended to debug.log
  • -display none Disables the graphical display window. No VGA text output will be visible unless you use -nographic, serial, or the 0xE9 debug console
  • -no-reboot on a guest reboot request, do not reboot; QEMU exits instead (handy for catching triple-fault loops).
  • -no-shutdown on a guest power-off, don’t quit QEMU; keep it running so logs/console remain available.
  • -d guest_errors,cpu_reset Enables QEMU’s internal debug logging for guest faults and CPU resets (for example, triple faults). The messages are written to the file specified by -D
  • -D qemu.log Write QEMU’s debug logs (from -d) to qemu.log instead of stderr.

We will plan to print with BIOS INT 0x10 later on, so this instruction will evolve as we go.

Running

Let’s give it a go.

By running make you should see output like this:

➜  make
nasm -f bin boot/boot.asm -o boot/boot.bin
rm -f os.img
dd if=boot/boot.bin of=os.img bs=512 count=1 conv=notrunc
1+0 records in
1+0 records out
512 bytes copied, 9.4217e-05 s, 5.4 MB/s
truncate -s $((32*512)) os.img

You can then run this make run:

➜  make run
qemu-system-x86_64 -drive file=os.img,format=raw -serial stdio -debugcon file:debug.log -global isa-debugcon.iobase=0xe9 -display none -no-reboot -no-shutdown -d guest_errors,cpu_reset -D qemu.log

And there you have it. Our bootloader ran very briefly, and now our machine is halted.

Conclusion

We’ve managed to setup our build environment and get a very simple boot loader being executed by QEMU. In further tutorials we’ll look at integrating the serial COM1 ports so that we can get some signs of life reported out to the console.

Naive Bayes Classifier from First Principles

Introduction

The Naive Bayes classifier is one of the simplest algorithms in machine learning, yet it’s surprisingly powerful.
It answers the question:

“Given some evidence, what is the most likely class?”

It’s naive because it assumes that features are conditionally independent given the class. That assumption rarely holds in the real world — but the algorithm still works remarkably well for many tasks such as spam filtering, document classification, and sentiment analysis.

At its core, Naive Bayes is just counting, multiplying probabilities, and picking the largest one.

Bayes’ Rule Refresher

First, let’s start with a quick definition of terms.

Class is the label that we’re trying to predict. In our example below, the class will be either “spam” or “ham” (not spam).

The features are the observed pieces of evidence. For text, features are usually the words in a message.

P is shorthand for “probability”.

  • P(Class) = the prior probability: how likely a class is before seeing any features.
  • P(Features | Class) = the likelihood: how likely it is to see those words if the class is true.
  • P(Features) = the evidence: how likely the features are overall, across the classes. This acts as a normalising constant so probabilities sum to 1.

Bayes’ rule tells us:

P(Class | Features) = ( P(Class) * P(Features | Class) ) / P(Features)

Since the denominator is the same for all classes, we only care about:

P(Class | Features) ∝ P(Class) * Π P(Feature_i | Class)

Naive Bayes assumes independence, so the likelihood of multiple features is just the product of the individual feature likelihoods.

Pen & Paper Example

Let’s build the smallest possible spam filter.

Our training data is 4 tiny, two word emails:

Spam → "buy cheap"
Spam → "cheap pills"
Ham  → "meeting schedule"
Ham  → "project meeting"

Based on this training set, we can say:

P(spam) = 2/4 = 0.5  
P(ham)  = 2/4 = 0.5  

Next, we can look at the word likelihoods for a given class.

For spam (words: buy, cheap, pills):

P(buy | spam)    = 1/4  
P(cheap | spam)  = 2/4  
P(pills | spam)  = 1/4  

For ham (words: meeting, schedule, project):

P(meeting | ham)   = 2/4  
P(schedule | ham)  = 1/4  
P(project | ham)   = 1/4  

With all of this basic information in place, we can try and classify a new email.

As an example, we’ll look at an email that simply says "cheap meeting".

For spam:

  P(spam) * P(cheap|spam) * P(meeting|spam)  
= 0.5     * (2/4)         * (0/4) 
= 0  

For ham:

  P(ham) * P(cheap|ham) * P(meeting|ham)  
= 0.5    * (0/4)        * (2/4) 
= 0  
That didn't work! Both go to zero because “cheap” never appeared in ham, and “meeting” never appeared in spam. This is why we use Laplace smoothing

Laplace Smoothing

To avoid zero probabilities, add a tiny count (usually +1) to every word in every class.

With smoothing, our likelihoods become:

P(buy | spam)    = (1+1)/(4+6) = 2/10  
P(cheap | spam)  = (2+1)/(4+6) = 3/10  
P(pills | spam)  = (1+1)/(4+6) = 2/10  
P(meeting | spam)= (0+1)/(4+6) = 1/10  
… and so on for ham.

Here 4 is the total token count in spam, and 6 is the vocabulary size.

Now the “cheap meeting” example will give non-zero values, and we can meaningfully compare classes.

For spam:

  P(spam) * P(cheap|spam) * P(meeting|spam)  
= 0.5     * (3/10)        * (1/10)
= 0.015  

For ham:

  P(ham) * P(cheap|ham) * P(meeting|ham)  
= 0.5    * (1/10)       * (3/10)
= 0.015

So both classes land on the same score — a perfect tie, in this example.

Python Demo (from scratch)

Here’s a tiny implementation that mirrors the example above:

from collections import Counter, defaultdict

# Training data
docs = [
    ("spam", "buy cheap"),
    ("spam", "cheap pills"),
    ("ham",  "meeting schedule"),
    ("ham",  "project meeting"),
]

class_counts = Counter()
word_counts = defaultdict(Counter)

# Build counts
for label, text in docs:
    class_counts[label] += 1
    for word in text.split():
        word_counts[label][word] += 1

def classify(text, alpha=1.0):
    words = text.split()
    scores = {}
    total_docs = sum(class_counts.values())
    vocab = {w for counts in word_counts.values() for w in counts}
    V = len(vocab)

    for label in class_counts:
        # Prior
        score = class_counts[label] / total_docs
        total_words = sum(word_counts[label].values())

        for word in words:
            count = word_counts[label][word]
            # Laplace smoothing
            score *= (count + alpha) / (total_words + alpha * V)

        scores[label] = score

    # Pick the class with the highest score
    return max(scores, key=scores.get), scores

print(classify("cheap project"))
print(classify("project schedule"))
print(classify("cheap schedule"))

Running this gives:

('spam', {'spam': 0.015, 'ham': 0.015})
('ham', {'spam': 0.005, 'ham': 0.02})
('spam', {'spam': 0.015, 'ham': 0.01})

As we predicted earlier, "cheap project" is a tie, while "project schedule" is more likely ham. Finally, "cheap schedule" is noted as spam because it uses stronger spam trigger words.

Real-World Notes

  • Naive Bayes is fast, memory-efficient, and easy to implement.
  • Works well for text classification, document tagging, and spam filtering.
  • The independence assumption is rarely true, but it doesn’t matter — it often performs surprisingly well.
  • In production, you’d tokenize better, remove stop words, and work with thousands of documents.

Conclusion

Building a Naive Bayes classifier from first principles is a great exercise because it shows how machine learning can be just careful counting and probability. With priors, likelihoods, and a dash of smoothing, you get a surprisingly useful classifier — all without heavy math or libraries.

Bloom Filters Made Simple

Introduction

A Bloom filter is a tiny, probabilistic memory that answers “Have I seen this before?” in constant time. It never lies with a false negative—if it says “no”, the item was definitely never added. But to save huge amounts of space versus storing all items, it allows false positives—sometimes it will say “probably yes” when collisions happen.

The trick is simple: keep a row of bits and, for each item, flip a small handful of positions chosen by hash functions.

Later, check those same positions. Any zero means “definitely not”; all ones means “probably yes.” With just a few bytes, Bloom filters help databases skip disk lookups, caches dodge misses, and systems answer membership queries blazingly fast.

Pen & Paper Example

Let’s build the smallest possible Bloom filter: 10 bits, indexed 0–9.
We’ll use two playful “hash” functions:

h1(word) = (sum of letter positions) % 10 (a=1, b=2, …, z=26)
h2(word) = (len(word) * 3)           % 10

To insert a word, flip bits h1(word) and h2(word) to 1.
To query, compute the same bits:

  • If any bit is 0 → definitely not present
  • If both are 1 → probably present

With these rules in place, we can start to insert some words.

If we were to insert the word “cat”:

h1("cat") = (3+1+20) = 24 %10=4  
h2("cat") = (3*3)    = 9  %10=9  

Flip bits 4 and 9.

Bits:

idx:  0 1 2 3 4 5 6 7 8 9  
bits: 0 0 0 0 1 0 0 0 0 1

Similarly, we can try inserting the word “dog”:

h1("dog") = (4+15+7) = 26 %10=6  
h2("dog") = (3*3)    = 9  %10=9  

Flip bits 6 and 9.

Bits:

idx:  0 1 2 3 4 5 6 7 8 9  
bits: 0 0 0 0 1 0 1 0 0 1

We can now start to try a few queries. If we look for “cat”, we can see that bits 4 and 9 are both 1, which indicates that “cat” is probably present.

If we look for something that we haven’t added yet, like “cow”:

h1("cow") = (3+15+23) = 41 %10=1  
h2("cow") = (3*3)     = 9  %10=9  

Bit 1 is still 0 so that tells us that it’s definitely not present.

Python Demo (matching the example)

This code reproduces the steps above:

from string import ascii_lowercase

# Map a..z -> 1..26
ABC = {ch: i+1 for i, ch in enumerate(ascii_lowercase)}

def h1(word: str, m: int) -> int:
    return sum(ABC.get(ch, 0) for ch in word.lower()) % m

def h2(word: str, m: int) -> int:
    return (len(word) * 3) % m

class Bloom:
    def __init__(self, m=10):
        self.m = m
        self.bits = [0]*m

    def _idxs(self, word):
        return [h1(word, self.m), h2(word, self.m)]

    def add(self, word):
        for i in self._idxs(word):
            self.bits[i] = 1

    def might_contain(self, word) -> bool:
        return all(self.bits[i] == 1 for i in self._idxs(word))

    def show(self):
        idx = "idx:  " + " ".join(f"{i}" for i in range(self.m))
        bts = "bits: " + " ".join(str(b) for b in self.bits)
        print(idx)
        print(bts)
        print()

bf = Bloom(m=10)

print("Insert 'cat'")
bf.add("cat")
bf.show()

print("Insert 'dog'")
bf.add("dog")
bf.show()

def check(w):
    print(f"Query '{w}':", "probably present" if bf.might_contain(w) else "definitely not present")

check("cat")
check("cow")
for w in ["dot", "cot", "tag", "god"]:
    check(w)

Run this and you’ll see the exact same evolution of bits as the “pen & paper” example.

Insert 'cat'
idx:  0 1 2 3 4 5 6 7 8 9
bits: 0 0 0 0 1 0 0 0 0 1

Insert 'dog'
idx:  0 1 2 3 4 5 6 7 8 9
bits: 0 0 0 0 1 0 1 0 0 1

Query 'cat': probably present
Query 'cow': definitely not present
Query 'dot': probably present
Query 'cot': definitely not present
Query 'tag': definitely not present
Query 'god': probably present

Toward Real Bloom Filters

Our toy version used silly hash rules. Real implementations use cryptographic hashes and multiple derived functions. Here’s a slightly more realistic snippet using double hashing from SHA-256 and MD5:

import hashlib

class RealishBloom:
    def __init__(self, m=128, k=4):
        self.m = m
        self.k = k
        self.bits = [0]*m

    def _hashes(self, word: str):
        w = word.encode()
        h = int.from_bytes(hashlib.sha256(w).digest(), "big")
        g = int.from_bytes(hashlib.md5(w).digest(), "big")
        for i in range(self.k):
            yield (h + i*g) % self.m

    def add(self, word: str):
        for i in self._hashes(word):
            self.bits[i] = 1

    def might_contain(self, word: str) -> bool:
        return all(self.bits[i] == 1 for i in self._hashes(word))

This implementation will allow you to filter much more complex (and longer) content. The wider your bit field is, and the more complex your hashing algorithms are, the better bit distribution you will get. This gives you a lower chance of false positives, improving the overall performance of the data structure.

Conclusion

Bloom filters are elegant because of their simplicity: flip a few bits when adding, check those bits when querying. They trade absolute certainty for massive savings in memory and time. They’re everywhere—from browsers to databases to networking—and now, thanks to a handful of cat, dog, and cow, you know how they work.

Maybe Monads in Python

Introduction

If you’ve spent any time in Haskell or FP circles, you’ll have run into the terms Functor, Applicative, and Monad. They can sound mysterious, but at their core they’re just design patterns for sequencing computations.

Python isn’t a purely functional language, but we can still capture these ideas in code. In this post, we’ll build a full Maybe type in Python: a safe container that represents either a value (Some) or no value (Nothing). We’ll compare it with the Haskell version along the way.

A full runnable demo of the code presented here is available as a gist up on GitHub.

Maybe

We start of with our box or context. In our case today, we might have a value in the box (Some) or the box maybe empty (Nothing). Because both of these are derivatives of the same thing, we create a base class of Maybe.

class Maybe(Generic[T]):
    @staticmethod
    def some(value: T) -> "Maybe[T]":
        return Some(value)

    @staticmethod
    def nothing() -> "Maybe[T]":
        return NOTHING  # singleton

    @staticmethod
    def from_nullable(value: Optional[T]) -> "Maybe[T]":
        return Nothing() if value is None else Some(value)

    @staticmethod
    def from_predicate(value: T, predicate: Callable[[T], bool]) -> "Maybe[T]":
        return Some(value) if predicate(value) else Nothing()

Now we need our derivatives:

@dataclass(frozen=True)
class Some(Maybe[T]):
    value: T

class Nothing(Maybe[Any]):

Our Maybe class here defines all of the operations we want to be able to perform on this datatype, but does not implement any of them; leaving the implementation to be filled in my the derivatives. You can expect the implementations between these derived classes to be quite different to each other.

We should end up with something like this:

classDiagram class Maybe { +map(f) +ap(mb) +bind(f) } class Some { +value: T } class Nothing Maybe <|-- Some Maybe <|-- Nothing

Functor: Mapping over values

A Functor is anything you can map a function over. In Haskell, the generic Functor version of this is called fmap:

fmap (+1) (Just 10)   -- Just 11
fmap (+1) Nothing     -- Nothing

The flow of values through map (or fmap) looks like this:

flowchart LR A[Some 10] -->|map +1| B[Some 11] C[Nothing] -->|map +1| D[Nothing]

For our Python implementation, we implement map like this

# "Some" implementation
def map(self, f: Callable[[T], U]) -> "Maybe[U]":
    try:
        return Some(f(self.value))
    except Exception:
        return Nothing()

# "Nothing" implementation
def map(self, f: Callable[[Any], U]) -> "Maybe[U]":
    return self

We can now implement the example using these functions:

Some(10).map(lambda x: x + 1)   # Some(11)
Nothing().map(lambda x: x + 1)  # Nothing()

The idea: if there’s a value inside, apply the function. If not, do nothing.

Applicative: Combining values

Applicatives let us apply functions that are also inside the box. In Haskell, this is the <*> operator:

pure (+) <*> Just 2 <*> Just 40   -- Just 42

Here we’re applying a function wrapped in Some to a value wrapped in Some. If either side is Nothing, the result is Nothing.

flowchart LR F[Some f] -->|ap| V[Some 2] V --> R[Some f2] FN[Some f] -->|ap| N[Nothing] N --> RN[Nothing] N2[Nothing] -->|ap| V2[Some 2] V2 --> RN2[Nothing]

For our Python implementation, we’ll call this ap.

The Some implementation takes a function out of one box, and applies it to the value inside another box:

def ap(self: "Maybe[Callable[[T], U]]", mb: "Maybe[T]") -> "Maybe[U]":
    func = self.value
    return mb.map(func)

The Nothing implementation just returns itself:

def ap(self: "Maybe[Callable[[Any], U]]", mb: "Maybe[Any]") -> "Maybe[U]":
    return self

This lets us combine multiple values when both boxes are full:

add = lambda x, y: x + y
Some(add).ap(Some(2)).ap(Some(40))   # Some(42)
Some(add).ap(Some(2)).ap(Nothing())  # Nothing()

Monad: Sequencing computations

A Monad takes things further: it lets us chain together computations that themselves return a Maybe.

In Haskell, this is the >>= operator (bind):

halfIfEven :: Int -> Maybe Int
halfIfEven x = if even x then Just (x `div` 2) else Nothing

Just 10 >>= halfIfEven    -- Just 5
Just 3  >>= halfIfEven    -- Nothing

Here we’re chaining a computation that itself returns a Maybe. If the starting point is Nothing, or if the function returns Nothing, the whole chain collapses.

flowchart LR S[Some x] --bind f--> FOUT[Some y] S --bind g--> GOUT[Nothing] N[Nothing] --bind f--> NRES[Nothing]

In Python we implement bind:

# "Some" implementation
def bind(self, f: Callable[[T], Maybe[U]]) -> "Maybe[U]":
    try:
        return f(self.value)
    except Exception:
        return Nothing()

# "Nothing" implementation
def bind(self, f: Callable[[Any], Maybe[U]]) -> "Maybe[U]":
    return self

And use it like this:

def half_if_even(x: int) -> Maybe[int]:
    return Some(x // 2) if x % 2 == 0 else Nothing()

Some(10).bind(half_if_even)   # Some(5)
Some(3).bind(half_if_even)    # Nothing()

Notice how the “empty box” propagates: if at any point we hit Nothing, the rest of the chain is skipped.

You’ll also see a common pattern emerging with all of the implementations for Nothing. There’s no computation. It’s simply just returning itself. As soon as you hit Nothing, you’re short-circuited to nothing.

Do Notation (Syntactic Sugar)

Haskell makes monadic code look imperative with do notation:

do
  a <- Just 4
  b <- halfIfEven a
  return (a + b)

In Python, we can approximate this style using a generator-based decorator. Each yield unwraps a Maybe, and the whole computation short-circuits if we ever see Nothing.

@maybe_do
def pipeline(start: int):
    a = yield Some(start + 1)
    b = yield half_if_even(a)
    c = yield Maybe.from_predicate(b + 3, lambda n: n > 4)
    return a + b + c

print(pipeline(3))  # Some(11)
print(pipeline(1))  # Nothing()

This isn’t strictly necessary, but it makes larger chains of monadic code read like straight-line Python.

Wrapping Up

By porting Maybe into Python and implementing map, ap, and bind, we’ve seen how Functors, Applicatives, and Monads aren’t magic at all — just structured patterns for working with values in context.

  • Functor: apply a function inside the box.
  • Applicative: apply a function that’s also in a box.
  • Monad: chain computations that each return a box.

Haskell bakes these ideas into the language; in Python, we can experiment with them explicitly. The result is safer, more composable code — and maybe even a little functional fun.