One of the best ways to learn how computers work is to get as close to the hardware as possible. Writing assembly
language with no other tools or libraries really helps you to understand exactly what makes them tick. I’m building
this article series to walk through the full setup of an x86 system to go from
power on to a minimal running operating system.
I’ll gradually build this from the ground up, introducing concepts as we go through these articles.
Today, we’ll get all the tooling and build environment setup so we can develop comfortably.
Tools
Before we begin, we need some tools installed.
QEMU for virtualising the system that will run our operating system
NASM to be our assembler
Make to manage our build chain
Get these installed on your respective system, and we can get started getting the project directory setup.
Project Setup
First up, let’s create our project directory and get our Makefile and bootloader started.
mkdir byo_os
mkdir byo_os/boot
cd byo_os
Boot loader
A boot loader is the very first piece of software that runs when a computer starts. Its job is to prepare the CPU and
memory so that a full operating system can take over. When the machine powers on, the BIOS (or UEFI) firmware
looks for a bootable program and transfers control to it.
In this tutorial we’re building a BIOS-style boot loader.
When a machine boots in legacy BIOS mode, the firmware reads the first 512 bytes of the boot device — called the
boot sector — into memory at address 0x7C00 and jumps there. Those 512 bytes must end with the magic signature
0xAA55, which tells the BIOS that this sector is bootable. From that point, our code is executing directly on the CPU
in 16-bit real mode, with no operating system or filesystem support at all.
Modern systems use UEFI, which is the successor to BIOS. UEFI firmware looks for a structured executable (a
PE/COFF file) stored on a FAT partition and provides a much richer environment — including APIs for disk I/O,
graphics, and memory services. It’s powerful, but it’s also more complex and hides many of the low-level details we
want to understand.
Starting with BIOS keeps things simple: one sector, one jump, and full control. Once we’ve built a working
real-mode boot loader and kernel, it’ll be easy to explore a UEFI variant later because the CPU initialization concepts
remain the same — only the firmware interface changes.
Here is our first boot loader.
; ./boot/boot.asmORG0x7C00; our code starts at 0x7C00BITS16; we're in 16-bit real modemain:cli; no interruptshlt; stop the processor.halt:jmp.halttimes510-($-$$)db0; pad out to 510 bytesdw0AA55h; 2 byte signature
Our boot loader must be 512 bytes. We ensure that it is with times 510-($-$$) db 0. This directive pads our
boot loader out to 510 bytes, leaving space for the final 2 signature bytes dw 0AA55h which all boot loaders must
finish with.
Building
With this code written, we need to be able to build and run it. Using a Makefile is an easy way to wrap up all of
these actions so we don’t need to remember all of the build steps.
This will build a boot/boot.bin for us, and it will also pack it into an os.img which we will use to run our os.
The key lines in making the os image are the dd and truncate. They get our 512 byte boot sector first in the image,
and then the truncate extends the image to 32 sectors (16 KB total) by padding it with zeros. The extra space
simulates a small disk, leaving room for later stages like a kernel or filesystem. The first 512 bytes remain our boot
sector; the rest is just blank space the BIOS ignores for now.
-drive file=os.img,format=raw Attach a raw disk image as the primary drive. When QEMU boots in BIOS mode, it loads the first sector (the MBR) if it ends with the signature 0xAA55.
-serial stdio redirect the guest’s COM1 serial port (I/O 0x3F8) to this terminal’s stdin/stdout, so any serial output from the guest appears in your console.
-debugcon file:debug.log will dump the debug console into a file called debug.log
-global isa-debugcon.iobase=0xe9 Map QEMU’s simple debug console to I/O port 0xE9. Any out 0xE9, al from your code is appended to debug.log
-display none Disables the graphical display window. No VGA text output will be visible unless you use -nographic, serial, or the 0xE9 debug console
-no-reboot on a guest reboot request, do not reboot; QEMU exits instead (handy for catching triple-fault loops).
-no-shutdown on a guest power-off, don’t quit QEMU; keep it running so logs/console remain available.
-d guest_errors,cpu_reset Enables QEMU’s internal debug logging for guest faults and CPU resets (for example, triple faults). The messages are written to the file specified by -D
-D qemu.log Write QEMU’s debug logs (from -d) to qemu.log instead of stderr.
We will plan to print with BIOS INT 0x10 later on, so this instruction will evolve as we go.
Running
Let’s give it a go.
By running make you should see output like this:
➜ make
nasm -f bin boot/boot.asm -o boot/boot.bin
rm -f os.img
dd if=boot/boot.bin of=os.img bs=512 count=1 conv=notrunc
1+0 records in
1+0 records out
512 bytes copied, 9.4217e-05 s, 5.4 MB/s
truncate -s $((32*512)) os.img
You can then run this make run:
➜ make run
qemu-system-x86_64 -drive file=os.img,format=raw -serial stdio -debugcon file:debug.log -global isa-debugcon.iobase=0xe9 -display none -no-reboot -no-shutdown -d guest_errors,cpu_reset -D qemu.log
And there you have it. Our bootloader ran very briefly, and now our machine is halted.
Conclusion
We’ve managed to setup our build environment and get a very simple boot loader being executed by QEMU. In further
tutorials we’ll look at integrating the serial COM1 ports so that we can get some signs of life reported out to the
console.
The Naive Bayes classifier is one of the simplest algorithms in machine learning, yet it’s surprisingly powerful.
It answers the question:
“Given some evidence, what is the most likely class?”
It’s naive because it assumes that features are conditionally independent given the class. That assumption rarely
holds in the real world — but the algorithm still works remarkably well for many tasks such as spam filtering, document
classification, and sentiment analysis.
At its core, Naive Bayes is just counting, multiplying probabilities, and picking the largest one.
Bayes’ Rule Refresher
First, let’s start with a quick definition of terms.
Class is the label that we’re trying to predict. In our example below, the class will be either “spam” or “ham”
(not spam).
The features are the observed pieces of evidence. For text, features are usually the words in a message.
P is shorthand for “probability”.
P(Class) = the prior probability: how likely a class is before seeing any features.
P(Features | Class) = the likelihood: how likely it is to see those words if the class is true.
P(Features) = the evidence: how likely the features are overall, across the classes. This acts as a normalising constant so probabilities sum to 1.
So both classes land on the same score — a perfect tie, in this example.
Python Demo (from scratch)
Here’s a tiny implementation that mirrors the example above:
fromcollectionsimportCounter,defaultdict# Training data
docs=[("spam","buy cheap"),("spam","cheap pills"),("ham","meeting schedule"),("ham","project meeting"),]class_counts=Counter()word_counts=defaultdict(Counter)# Build counts
forlabel,textindocs:class_counts[label]+=1forwordintext.split():word_counts[label][word]+=1defclassify(text,alpha=1.0):words=text.split()scores={}total_docs=sum(class_counts.values())vocab={wforcountsinword_counts.values()forwincounts}V=len(vocab)forlabelinclass_counts:# Prior
score=class_counts[label]/total_docstotal_words=sum(word_counts[label].values())forwordinwords:count=word_counts[label][word]# Laplace smoothing
score*=(count+alpha)/(total_words+alpha*V)scores[label]=score# Pick the class with the highest score
returnmax(scores,key=scores.get),scoresprint(classify("cheap project"))print(classify("project schedule"))print(classify("cheap schedule"))
As we predicted earlier, "cheap project" is a tie, while "project schedule" is more likely ham. Finally, "cheap schedule"
is noted as spam because it uses stronger spam trigger words.
Real-World Notes
Naive Bayes is fast, memory-efficient, and easy to implement.
Works well for text classification, document tagging, and spam filtering.
The independence assumption is rarely true, but it doesn’t matter — it often performs surprisingly well.
In production, you’d tokenize better, remove stop words, and work with thousands of documents.
Conclusion
Building a Naive Bayes classifier from first principles is a great exercise because it shows how machine learning can be
just careful counting and probability. With priors, likelihoods, and a dash of smoothing, you get a surprisingly useful
classifier — all without heavy math or libraries.
A Bloom filter is a tiny, probabilistic memory that answers “Have I seen this before?” in constant time. It never lies
with a false negative—if it says “no”, the item was definitely never added. But to save huge amounts of space versus
storing all items, it allows false positives—sometimes it will say “probably yes” when collisions happen.
The trick is simple: keep a row of bits and, for each item, flip a small handful of positions chosen by hash functions.
Later, check those same positions. Any zero means “definitely not”; all ones means “probably yes.” With just a few
bytes, Bloom filters help databases skip disk lookups, caches dodge misses, and systems answer membership queries
blazingly fast.
Pen & Paper Example
Let’s build the smallest possible Bloom filter: 10 bits, indexed 0–9.
We’ll use two playful “hash” functions:
h1(word) = (sum of letter positions) % 10 (a=1, b=2, …, z=26)
h2(word) = (len(word) * 3) % 10
To insert a word, flip bits h1(word) and h2(word) to 1.
To query, compute the same bits:
If any bit is 0 → definitely not present
If both are 1 → probably present
With these rules in place, we can start to insert some words.
Our toy version used silly hash rules. Real implementations use cryptographic hashes and multiple derived functions.
Here’s a slightly more realistic snippet using double hashing from SHA-256 and MD5:
This implementation will allow you to filter much more complex (and longer) content. The wider your bit field is, and the
more complex your hashing algorithms are, the better bit distribution you will get. This gives you a lower
chance of false positives, improving the overall performance of the data structure.
Conclusion
Bloom filters are elegant because of their simplicity: flip a few bits when adding, check those bits when querying.
They trade absolute certainty for massive savings in memory and time. They’re everywhere—from browsers to databases to
networking—and now, thanks to a handful of cat, dog, and cow, you know how they work.
If you’ve spent any time in Haskell or FP circles, you’ll have run into the terms Functor, Applicative, and
Monad. They can sound mysterious, but at their core they’re just design patterns for sequencing computations.
Python isn’t a purely functional language, but we can still capture these ideas in code. In this post, we’ll build a
full Maybe type in Python: a safe container that represents either a value (Some) or no value (Nothing). We’ll
compare it with the Haskell version along the way.
A full runnable demo of the code presented here is available as a gist up on GitHub.
Maybe
We start of with our box or context. In our case today, we might have a value in the box (Some) or the box
maybe empty (Nothing). Because both of these are derivatives of the same thing, we create a base class of Maybe.
Our Maybe class here defines all of the operations we want to be able to perform on this datatype, but does not
implement any of them; leaving the implementation to be filled in my the derivatives. You can expect the implementations
between these derived classes to be quite different to each other.
We should end up with something like this:
classDiagram
class Maybe {
+map(f)
+ap(mb)
+bind(f)
}
class Some {
+value: T
}
class Nothing
Maybe <|-- Some
Maybe <|-- Nothing
Functor: Mapping over values
A Functor is anything you can map a function over. In Haskell, the generic Functor version of this is called
fmap:
fmap(+1)(Just10)-- Just 11fmap(+1)Nothing-- Nothing
The flow of values through map (or fmap) looks like this:
A Monad takes things further: it lets us chain together computations that themselves return a Maybe.
In Haskell, this is the >>= operator (bind):
halfIfEven::Int->MaybeInthalfIfEvenx=ifevenxthenJust(x`div`2)elseNothingJust10>>=halfIfEven-- Just 5Just3>>=halfIfEven-- Nothing
Here we’re chaining a computation that itself returns a Maybe. If the starting point is Nothing, or if the
function returns Nothing, the whole chain collapses.
flowchart LR
S[Some x] --bind f--> FOUT[Some y]
S --bind g--> GOUT[Nothing]
N[Nothing] --bind f--> NRES[Nothing]
Notice how the “empty box” propagates: if at any point we hit Nothing, the rest of the chain is skipped.
You’ll also see a common pattern emerging with all of the implementations for Nothing. There’s no computation. It’s
simply just returning itself. As soon as you hit Nothing, you’re short-circuited to nothing.
Do Notation (Syntactic Sugar)
Haskell makes monadic code look imperative with do notation:
doa<-Just4b<-halfIfEvenareturn(a+b)
In Python, we can approximate this style using a generator-based decorator. Each yield unwraps a Maybe, and the
whole computation short-circuits if we ever see Nothing.
This isn’t strictly necessary, but it makes larger chains of monadic code read like straight-line Python.
Wrapping Up
By porting Maybe into Python and implementing map, ap, and bind, we’ve seen how Functors, Applicatives, and
Monads aren’t magic at all — just structured patterns for working with values in context.
Functor: apply a function inside the box.
Applicative: apply a function that’s also in a box.
Monad: chain computations that each return a box.
Haskell bakes these ideas into the language; in Python, we can experiment with them explicitly. The result is safer,
more composable code — and maybe even a little functional fun.
Kerberos is one of those protocols that sounds mysterious until you see it in action. The moment you type kinit, run
klist, and watch a ticket pop up, it clicks: this is Single Sign-On in its rawest form. In this post we’ll set up a
tiny realm on a Debian test box (koffing.local), get a ticket-granting ticket (TGT), and then use it for SSH without
typing a password.
What is Kerberos?
Born at MIT’s Project Athena in the 1980s, Kerberos solved campus-wide single sign-on over untrusted networks. It
matured through v4 to Kerberos 5 (the standard you use today). It underpins enterprise SSO in Windows domains
(Active Directory) and many UNIX shops.
Kerberos authenticates clients to services without sending reusable secrets. You authenticate once to the KDC, get
a TGT (Ticket Granting Ticket), then use it to obtain per-service tickets from the TGS
(Ticket Granting Service).
Services trust the KDC, not your password.
Core terms
Realm: Admin boundary (e.g., LOCAL).
Principal: Identity in the realm, like michael@LOCAL (user) or host/koffing.local@LOCAL (service).
KDC: The authentication authority. Runs on koffing.local as krb5kdc and kadmind.
TGT: Your “hall pass.” Lets you ask the KDC for service tickets.
Service ticket: What you present to a service (e.g., SSHD on koffing.local) to prove identity.
Keytab: File holding long-term service keys (like for sshd). Lets the service authenticate without storing a password.
Here’s a visual representation of how the Kerberos flow operates:
sequenceDiagram
participant U as User
participant AS as KDC/AS
participant TGS as KDC/TGS
participant S as Service (e.g., SSHD)
U->>AS: AS-REQ (I am michael)
AS-->>U: AS-REP (TGT + session key)
U->>TGS: TGS-REQ (I want ticket for host/koffing.local)
TGS-->>U: TGS-REP (service ticket)
U->>S: AP-REQ (here's my service ticket)
S-->>U: AP-REP (optional) + access granted
Ok, with all of that out of the way we can get to setting up.
Setup
There’s a few packages to install and a little bit of configuration. All of these instructions are written for a
Debian/Ubuntu flavour of Linux. I’m sure that the instructions aren’t too far off for other distributions.
Install the packages
We install the Key Distribution Servicekrb5-kdc, Administration Serverkrb5-admin-server, and some Client
Utilitieskrb5-user.
The fully qualified name of my virtual machine that I’m testing all of this out on is called koffing.local. These
values would change to suit your environment.
Edit /etc/krb5.conf and make sure it looks like this:
[libdefaults]
default_realm = LOCAL
rdns = false
dns_lookup_kdc = false
forwardable = true
[realms]
LOCAL = {
kdc = koffing.local
admin_server = koffing.local
}
[domain_realm]
.local = LOCAL
koffing.local = LOCAL
Make sure your host resolves correctly:
hostname-f# should print: koffing.local (for me)
getent hosts koffing.local
# If needed, add to /etc/hosts:# 127.0.1.1 koffing.local koffing
Create the KDC database
Now we initialize the database that will hold all of your principals, policies, realms, etc.
sudo mkdir -p /var/lib/krb5kdc
sudo kdb5_util create -s -r LOCAL
# set the KDC master password when prompted
This will show you which hostnames it resolves, which tickets it requests, and where it fails.
List all principals in the KDC database:
sudo kadmin.local -q"listprincs"
Clear your credential cache if tickets get stale:
kdestroy
The two most common pitfalls are:
Hostname mismatch
Realm mismatch (default realm not set in /etc/krb5.conf).
SSO
So, we’ve got the proof of concept going, but it would be good to see this in action. What we’ll cover in this next
section is getting the sshd service to trust our Kerberos tickets. This will allow for passwordless SSH for the
user.
Add the host service principal and keytab
In order to get KDC to vouch for services, those services need principal definitions. A principal is any Kerberos
identity. Users get user principals (as we saw above), services also need principals.
A keytab is a file that stores one or more Kerberos keys (like passwords, but in cryptographic form). Unlike users
(who can type passwords into kinit), services can’t type passwords interactively. So the KDC generates a random key
for host/koffing.local@LOCAL (-randkey) and you export it into /etc/krb5.keytab with ktadd.
Now sshd can silently use that keytab to decrypt tickets clients send it.
Enable GSSAPI in sshd
The global /etc/ssh/sshd_config needs a couple of flags flicked. The SSH daemon doesn’t implement Kerberos directly,
so it uses the GSSAPI library functions provided by MIT Kerberos (or Heimdal) to handle ticket validation. GSSAPI
isn’t a protocol itself; it’s an API or abstraction layer.
Once we’ve flipped these switches we are telling sshd“Accept authentication from any GSSAPI mechanism. In practice, this means Kerberos tickets.”.
This setup is obviously done on any server that you want to do this SSO style login with. It’s a bit confusing in my
example here, because everything is on the one machine.
Configure your SSH client
Conversely, we have configuration to do on the client side. For clients that want to connect with this type of
authentication, the following settings are required in their ~/.ssh/config:
If everything lines up, ssh should not prompt for a password. Your Kerberos TGT has been used to authenticate silently.
Where Kerberos Fits
Kerberos is ideal for LAN-based authentication: it provides fast, passwordless single sign-on for services like SSH,
Postgres, and intranet HTTP apps. But it isn’t designed for cross-organization web or mobile use.
Modern protocols like OIDC (OpenID Connect) build on OAuth 2.0 to provide authentication and federation across the
public internet. They use signed tokens, redirect flows, and JSON-based metadata — making them better suited for SaaS,
cloud apps, and mobile clients.
In short: Kerberos is the right tool inside the castle walls; OIDC is the right tool when your users are everywhere.
Wrap-up
We’ve stood up a Kerberos realm (LOCAL), issued a TGT for a user (michael), and used it for passwordless SSH into
the same box. That’s enough to demystify Kerberos: no secrets flying across the network, just short-lived tickets
granted by a trusted KDC.
There’s plenty more that we can accomplish here as we could create service principals for HTTP, Postgres, or
cross-realm trust.