Cogs and Levers A blog full of technical stuff

Naive Bayes Classifier from First Principles

Introduction

The Naive Bayes classifier is one of the simplest algorithms in machine learning, yet it’s surprisingly powerful.
It answers the question:

“Given some evidence, what is the most likely class?”

It’s naive because it assumes that features are conditionally independent given the class. That assumption rarely holds in the real world — but the algorithm still works remarkably well for many tasks such as spam filtering, document classification, and sentiment analysis.

At its core, Naive Bayes is just counting, multiplying probabilities, and picking the largest one.

Bayes’ Rule Refresher

First, let’s start with a quick definition of terms.

Class is the label that we’re trying to predict. In our example below, the class will be either “spam” or “ham” (not spam).

The features are the observed pieces of evidence. For text, features are usually the words in a message.

P is shorthand for “probability”.

  • P(Class) = the prior probability: how likely a class is before seeing any features.
  • P(Features | Class) = the likelihood: how likely it is to see those words if the class is true.
  • P(Features) = the evidence: how likely the features are overall, across the classes. This acts as a normalising constant so probabilities sum to 1.

Bayes’ rule tells us:

P(Class | Features) = ( P(Class) * P(Features | Class) ) / P(Features)

Since the denominator is the same for all classes, we only care about:

P(Class | Features) ∝ P(Class) * Π P(Feature_i | Class)

Naive Bayes assumes independence, so the likelihood of multiple features is just the product of the individual feature likelihoods.

Pen & Paper Example

Let’s build the smallest possible spam filter.

Our training data is 4 tiny, two word emails:

Spam → "buy cheap"
Spam → "cheap pills"
Ham  → "meeting schedule"
Ham  → "project meeting"

Based on this training set, we can say:

P(spam) = 2/4 = 0.5  
P(ham)  = 2/4 = 0.5  

Next, we can look at the word likelihoods for a given class.

For spam (words: buy, cheap, pills):

P(buy | spam)    = 1/4  
P(cheap | spam)  = 2/4  
P(pills | spam)  = 1/4  

For ham (words: meeting, schedule, project):

P(meeting | ham)   = 2/4  
P(schedule | ham)  = 1/4  
P(project | ham)   = 1/4  

With all of this basic information in place, we can try and classify a new email.

As an example, we’ll look at an email that simply says "cheap meeting".

For spam:

  P(spam) * P(cheap|spam) * P(meeting|spam)  
= 0.5     * (2/4)         * (0/4) 
= 0  

For ham:

  P(ham) * P(cheap|ham) * P(meeting|ham)  
= 0.5    * (0/4)        * (2/4) 
= 0  
That didn't work! Both go to zero because “cheap” never appeared in ham, and “meeting” never appeared in spam. This is why we use Laplace smoothing

Laplace Smoothing

To avoid zero probabilities, add a tiny count (usually +1) to every word in every class.

With smoothing, our likelihoods become:

P(buy | spam)    = (1+1)/(4+6) = 2/10  
P(cheap | spam)  = (2+1)/(4+6) = 3/10  
P(pills | spam)  = (1+1)/(4+6) = 2/10  
P(meeting | spam)= (0+1)/(4+6) = 1/10  
… and so on for ham.

Here 4 is the total token count in spam, and 6 is the vocabulary size.

Now the “cheap meeting” example will give non-zero values, and we can meaningfully compare classes.

For spam:

  P(spam) * P(cheap|spam) * P(meeting|spam)  
= 0.5     * (3/10)        * (1/10)
= 0.015  

For ham:

  P(ham) * P(cheap|ham) * P(meeting|ham)  
= 0.5    * (1/10)       * (3/10)
= 0.015

So both classes land on the same score — a perfect tie, in this example.

Python Demo (from scratch)

Here’s a tiny implementation that mirrors the example above:

from collections import Counter, defaultdict

# Training data
docs = [
    ("spam", "buy cheap"),
    ("spam", "cheap pills"),
    ("ham",  "meeting schedule"),
    ("ham",  "project meeting"),
]

class_counts = Counter()
word_counts = defaultdict(Counter)

# Build counts
for label, text in docs:
    class_counts[label] += 1
    for word in text.split():
        word_counts[label][word] += 1

def classify(text, alpha=1.0):
    words = text.split()
    scores = {}
    total_docs = sum(class_counts.values())
    vocab = {w for counts in word_counts.values() for w in counts}
    V = len(vocab)

    for label in class_counts:
        # Prior
        score = class_counts[label] / total_docs
        total_words = sum(word_counts[label].values())

        for word in words:
            count = word_counts[label][word]
            # Laplace smoothing
            score *= (count + alpha) / (total_words + alpha * V)

        scores[label] = score

    # Pick the class with the highest score
    return max(scores, key=scores.get), scores

print(classify("cheap project"))
print(classify("project schedule"))
print(classify("cheap schedule"))

Running this gives:

('spam', {'spam': 0.015, 'ham': 0.015})
('ham', {'spam': 0.005, 'ham': 0.02})
('spam', {'spam': 0.015, 'ham': 0.01})

As we predicted earlier, "cheap project" is a tie, while "project schedule" is more likely ham. Finally, "cheap schedule" is noted as spam because it uses stronger spam trigger words.

Real-World Notes

  • Naive Bayes is fast, memory-efficient, and easy to implement.
  • Works well for text classification, document tagging, and spam filtering.
  • The independence assumption is rarely true, but it doesn’t matter — it often performs surprisingly well.
  • In production, you’d tokenize better, remove stop words, and work with thousands of documents.

Conclusion

Building a Naive Bayes classifier from first principles is a great exercise because it shows how machine learning can be just careful counting and probability. With priors, likelihoods, and a dash of smoothing, you get a surprisingly useful classifier — all without heavy math or libraries.

Bloom Filters Made Simple

Introduction

A Bloom filter is a tiny, probabilistic memory that answers “Have I seen this before?” in constant time. It never lies with a false negative—if it says “no”, the item was definitely never added. But to save huge amounts of space versus storing all items, it allows false positives—sometimes it will say “probably yes” when collisions happen.

The trick is simple: keep a row of bits and, for each item, flip a small handful of positions chosen by hash functions.

Later, check those same positions. Any zero means “definitely not”; all ones means “probably yes.” With just a few bytes, Bloom filters help databases skip disk lookups, caches dodge misses, and systems answer membership queries blazingly fast.

Pen & Paper Example

Let’s build the smallest possible Bloom filter: 10 bits, indexed 0–9.
We’ll use two playful “hash” functions:

h1(word) = (sum of letter positions) % 10 (a=1, b=2, …, z=26)
h2(word) = (len(word) * 3)           % 10

To insert a word, flip bits h1(word) and h2(word) to 1.
To query, compute the same bits:

  • If any bit is 0 → definitely not present
  • If both are 1 → probably present

With these rules in place, we can start to insert some words.

If we were to insert the word “cat”:

h1("cat") = (3+1+20) = 24 %10=4  
h2("cat") = (3*3)    = 9  %10=9  

Flip bits 4 and 9.

Bits:

idx:  0 1 2 3 4 5 6 7 8 9  
bits: 0 0 0 0 1 0 0 0 0 1

Similarly, we can try inserting the word “dog”:

h1("dog") = (4+15+7) = 26 %10=6  
h2("dog") = (3*3)    = 9  %10=9  

Flip bits 6 and 9.

Bits:

idx:  0 1 2 3 4 5 6 7 8 9  
bits: 0 0 0 0 1 0 1 0 0 1

We can now start to try a few queries. If we look for “cat”, we can see that bits 4 and 9 are both 1, which indicates that “cat” is probably present.

If we look for something that we haven’t added yet, like “cow”:

h1("cow") = (3+15+23) = 41 %10=1  
h2("cow") = (3*3)     = 9  %10=9  

Bit 1 is still 0 so that tells us that it’s definitely not present.

Python Demo (matching the example)

This code reproduces the steps above:

from string import ascii_lowercase

# Map a..z -> 1..26
ABC = {ch: i+1 for i, ch in enumerate(ascii_lowercase)}

def h1(word: str, m: int) -> int:
    return sum(ABC.get(ch, 0) for ch in word.lower()) % m

def h2(word: str, m: int) -> int:
    return (len(word) * 3) % m

class Bloom:
    def __init__(self, m=10):
        self.m = m
        self.bits = [0]*m

    def _idxs(self, word):
        return [h1(word, self.m), h2(word, self.m)]

    def add(self, word):
        for i in self._idxs(word):
            self.bits[i] = 1

    def might_contain(self, word) -> bool:
        return all(self.bits[i] == 1 for i in self._idxs(word))

    def show(self):
        idx = "idx:  " + " ".join(f"{i}" for i in range(self.m))
        bts = "bits: " + " ".join(str(b) for b in self.bits)
        print(idx)
        print(bts)
        print()

bf = Bloom(m=10)

print("Insert 'cat'")
bf.add("cat")
bf.show()

print("Insert 'dog'")
bf.add("dog")
bf.show()

def check(w):
    print(f"Query '{w}':", "probably present" if bf.might_contain(w) else "definitely not present")

check("cat")
check("cow")
for w in ["dot", "cot", "tag", "god"]:
    check(w)

Run this and you’ll see the exact same evolution of bits as the “pen & paper” example.

Insert 'cat'
idx:  0 1 2 3 4 5 6 7 8 9
bits: 0 0 0 0 1 0 0 0 0 1

Insert 'dog'
idx:  0 1 2 3 4 5 6 7 8 9
bits: 0 0 0 0 1 0 1 0 0 1

Query 'cat': probably present
Query 'cow': definitely not present
Query 'dot': probably present
Query 'cot': definitely not present
Query 'tag': definitely not present
Query 'god': probably present

Toward Real Bloom Filters

Our toy version used silly hash rules. Real implementations use cryptographic hashes and multiple derived functions. Here’s a slightly more realistic snippet using double hashing from SHA-256 and MD5:

import hashlib

class RealishBloom:
    def __init__(self, m=128, k=4):
        self.m = m
        self.k = k
        self.bits = [0]*m

    def _hashes(self, word: str):
        w = word.encode()
        h = int.from_bytes(hashlib.sha256(w).digest(), "big")
        g = int.from_bytes(hashlib.md5(w).digest(), "big")
        for i in range(self.k):
            yield (h + i*g) % self.m

    def add(self, word: str):
        for i in self._hashes(word):
            self.bits[i] = 1

    def might_contain(self, word: str) -> bool:
        return all(self.bits[i] == 1 for i in self._hashes(word))

This implementation will allow you to filter much more complex (and longer) content. The wider your bit field is, and the more complex your hashing algorithms are, the better bit distribution you will get. This gives you a lower chance of false positives, improving the overall performance of the data structure.

Conclusion

Bloom filters are elegant because of their simplicity: flip a few bits when adding, check those bits when querying. They trade absolute certainty for massive savings in memory and time. They’re everywhere—from browsers to databases to networking—and now, thanks to a handful of cat, dog, and cow, you know how they work.

Maybe Monads in Python

Introduction

If you’ve spent any time in Haskell or FP circles, you’ll have run into the terms Functor, Applicative, and Monad. They can sound mysterious, but at their core they’re just design patterns for sequencing computations.

Python isn’t a purely functional language, but we can still capture these ideas in code. In this post, we’ll build a full Maybe type in Python: a safe container that represents either a value (Some) or no value (Nothing). We’ll compare it with the Haskell version along the way.

A full runnable demo of the code presented here is available as a gist up on GitHub.

Maybe

We start of with our box or context. In our case today, we might have a value in the box (Some) or the box maybe empty (Nothing). Because both of these are derivatives of the same thing, we create a base class of Maybe.

class Maybe(Generic[T]):
    @staticmethod
    def some(value: T) -> "Maybe[T]":
        return Some(value)

    @staticmethod
    def nothing() -> "Maybe[T]":
        return NOTHING  # singleton

    @staticmethod
    def from_nullable(value: Optional[T]) -> "Maybe[T]":
        return Nothing() if value is None else Some(value)

    @staticmethod
    def from_predicate(value: T, predicate: Callable[[T], bool]) -> "Maybe[T]":
        return Some(value) if predicate(value) else Nothing()

Now we need our derivatives:

@dataclass(frozen=True)
class Some(Maybe[T]):
    value: T

class Nothing(Maybe[Any]):

Our Maybe class here defines all of the operations we want to be able to perform on this datatype, but does not implement any of them; leaving the implementation to be filled in my the derivatives. You can expect the implementations between these derived classes to be quite different to each other.

We should end up with something like this:

classDiagram class Maybe { +map(f) +ap(mb) +bind(f) } class Some { +value: T } class Nothing Maybe <|-- Some Maybe <|-- Nothing

Functor: Mapping over values

A Functor is anything you can map a function over. In Haskell, the generic Functor version of this is called fmap:

fmap (+1) (Just 10)   -- Just 11
fmap (+1) Nothing     -- Nothing

The flow of values through map (or fmap) looks like this:

flowchart LR A[Some 10] -->|map +1| B[Some 11] C[Nothing] -->|map +1| D[Nothing]

For our Python implementation, we implement map like this

# "Some" implementation
def map(self, f: Callable[[T], U]) -> "Maybe[U]":
    try:
        return Some(f(self.value))
    except Exception:
        return Nothing()

# "Nothing" implementation
def map(self, f: Callable[[Any], U]) -> "Maybe[U]":
    return self

We can now implement the example using these functions:

Some(10).map(lambda x: x + 1)   # Some(11)
Nothing().map(lambda x: x + 1)  # Nothing()

The idea: if there’s a value inside, apply the function. If not, do nothing.

Applicative: Combining values

Applicatives let us apply functions that are also inside the box. In Haskell, this is the <*> operator:

pure (+) <*> Just 2 <*> Just 40   -- Just 42

Here we’re applying a function wrapped in Some to a value wrapped in Some. If either side is Nothing, the result is Nothing.

flowchart LR F[Some f] -->|ap| V[Some 2] V --> R[Some f2] FN[Some f] -->|ap| N[Nothing] N --> RN[Nothing] N2[Nothing] -->|ap| V2[Some 2] V2 --> RN2[Nothing]

For our Python implementation, we’ll call this ap.

The Some implementation takes a function out of one box, and applies it to the value inside another box:

def ap(self: "Maybe[Callable[[T], U]]", mb: "Maybe[T]") -> "Maybe[U]":
    func = self.value
    return mb.map(func)

The Nothing implementation just returns itself:

def ap(self: "Maybe[Callable[[Any], U]]", mb: "Maybe[Any]") -> "Maybe[U]":
    return self

This lets us combine multiple values when both boxes are full:

add = lambda x, y: x + y
Some(add).ap(Some(2)).ap(Some(40))   # Some(42)
Some(add).ap(Some(2)).ap(Nothing())  # Nothing()

Monad: Sequencing computations

A Monad takes things further: it lets us chain together computations that themselves return a Maybe.

In Haskell, this is the >>= operator (bind):

halfIfEven :: Int -> Maybe Int
halfIfEven x = if even x then Just (x `div` 2) else Nothing

Just 10 >>= halfIfEven    -- Just 5
Just 3  >>= halfIfEven    -- Nothing

Here we’re chaining a computation that itself returns a Maybe. If the starting point is Nothing, or if the function returns Nothing, the whole chain collapses.

flowchart LR S[Some x] --bind f--> FOUT[Some y] S --bind g--> GOUT[Nothing] N[Nothing] --bind f--> NRES[Nothing]

In Python we implement bind:

# "Some" implementation
def bind(self, f: Callable[[T], Maybe[U]]) -> "Maybe[U]":
    try:
        return f(self.value)
    except Exception:
        return Nothing()

# "Nothing" implementation
def bind(self, f: Callable[[Any], Maybe[U]]) -> "Maybe[U]":
    return self

And use it like this:

def half_if_even(x: int) -> Maybe[int]:
    return Some(x // 2) if x % 2 == 0 else Nothing()

Some(10).bind(half_if_even)   # Some(5)
Some(3).bind(half_if_even)    # Nothing()

Notice how the “empty box” propagates: if at any point we hit Nothing, the rest of the chain is skipped.

You’ll also see a common pattern emerging with all of the implementations for Nothing. There’s no computation. It’s simply just returning itself. As soon as you hit Nothing, you’re short-circuited to nothing.

Do Notation (Syntactic Sugar)

Haskell makes monadic code look imperative with do notation:

do
  a <- Just 4
  b <- halfIfEven a
  return (a + b)

In Python, we can approximate this style using a generator-based decorator. Each yield unwraps a Maybe, and the whole computation short-circuits if we ever see Nothing.

@maybe_do
def pipeline(start: int):
    a = yield Some(start + 1)
    b = yield half_if_even(a)
    c = yield Maybe.from_predicate(b + 3, lambda n: n > 4)
    return a + b + c

print(pipeline(3))  # Some(11)
print(pipeline(1))  # Nothing()

This isn’t strictly necessary, but it makes larger chains of monadic code read like straight-line Python.

Wrapping Up

By porting Maybe into Python and implementing map, ap, and bind, we’ve seen how Functors, Applicatives, and Monads aren’t magic at all — just structured patterns for working with values in context.

  • Functor: apply a function inside the box.
  • Applicative: apply a function that’s also in a box.
  • Monad: chain computations that each return a box.

Haskell bakes these ideas into the language; in Python, we can experiment with them explicitly. The result is safer, more composable code — and maybe even a little functional fun.

Kerberos on Linux

Introduction

Kerberos is one of those protocols that sounds mysterious until you see it in action. The moment you type kinit, run klist, and watch a ticket pop up, it clicks: this is Single Sign-On in its rawest form. In this post we’ll set up a tiny realm on a Debian test box (koffing.local), get a ticket-granting ticket (TGT), and then use it for SSH without typing a password.

What is Kerberos?

Born at MIT’s Project Athena in the 1980s, Kerberos solved campus-wide single sign-on over untrusted networks. It matured through v4 to Kerberos 5 (the standard you use today). It underpins enterprise SSO in Windows domains (Active Directory) and many UNIX shops.

Kerberos authenticates clients to services without sending reusable secrets. You authenticate once to the KDC, get a TGT (Ticket Granting Ticket), then use it to obtain per-service tickets from the TGS (Ticket Granting Service).

Services trust the KDC, not your password.

Core terms

  • Realm: Admin boundary (e.g., LOCAL).
  • Principal: Identity in the realm, like michael@LOCAL (user) or host/koffing.local@LOCAL (service).
  • KDC: The authentication authority. Runs on koffing.local as krb5kdc and kadmind.
  • TGT: Your “hall pass.” Lets you ask the KDC for service tickets.
  • Service ticket: What you present to a service (e.g., SSHD on koffing.local) to prove identity.
  • Keytab: File holding long-term service keys (like for sshd). Lets the service authenticate without storing a password.

Here’s a visual representation of how the Kerberos flow operates:

sequenceDiagram participant U as User participant AS as KDC/AS participant TGS as KDC/TGS participant S as Service (e.g., SSHD) U->>AS: AS-REQ (I am michael) AS-->>U: AS-REP (TGT + session key) U->>TGS: TGS-REQ (I want ticket for host/koffing.local) TGS-->>U: TGS-REP (service ticket) U->>S: AP-REQ (here's my service ticket) S-->>U: AP-REP (optional) + access granted

Ok, with all of that out of the way we can get to setting up.

Setup

There’s a few packages to install and a little bit of configuration. All of these instructions are written for a Debian/Ubuntu flavour of Linux. I’m sure that the instructions aren’t too far off for other distributions.

Install the packages

We install the Key Distribution Service krb5-kdc, Administration Server krb5-admin-server, and some Client Utilities krb5-user.

sudo apt update
sudo apt install -y krb5-kdc krb5-admin-server krb5-user

Configure your realm

The fully qualified name of my virtual machine that I’m testing all of this out on is called koffing.local. These values would change to suit your environment.

Edit /etc/krb5.conf and make sure it looks like this:

[libdefaults]
  default_realm = LOCAL
  rdns = false
  dns_lookup_kdc = false
  forwardable = true

[realms]
  LOCAL = {
    kdc = koffing.local
    admin_server = koffing.local
  }

[domain_realm]
  .local = LOCAL
  koffing.local = LOCAL

Make sure your host resolves correctly:

hostname -f        # should print: koffing.local (for me)

getent hosts koffing.local
# If needed, add to /etc/hosts:
# 127.0.1.1   koffing.local koffing

Create the KDC database

Now we initialize the database that will hold all of your principals, policies, realms, etc.

sudo mkdir -p /var/lib/krb5kdc
sudo kdb5_util create -s -r LOCAL
# set the KDC master password when prompted

Start the daemons:

sudo systemctl enable --now krb5-kdc krb5-admin-server
sudo systemctl status krb5-kdc krb5-admin-server --no-pager

Add principals

Create an admin and a user:

sudo kadmin.local -q "addprinc admin/admin"
sudo kadmin.local -q "addprinc michael"

Hello, Kerberos!

Now it’s time to give this a quick test. You can get a ticket with the following:

kdestroy
kinit michael
klist

You should see something similar to the following:

Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: michael@LOCAL

Valid starting     Expires            Service principal
13/09/25 16:14:32  14/09/25 02:14:32  krbtgt/LOCAL@LOCAL
	renew until 14/09/25 16:14:28

That’s your TGT — Kerberos is alive.

Troubleshooting

Kerberos is famously unforgiving about typos and hostname mismatches. Here are some quick checks if things go sideways:

Check hostnames / FQDNs

hostname -f # should print koffing.local
getent hosts koffing.local
Hostnames! If these don’t line up, Kerberos tickets won’t match the service principal name.

Check if the KDC is running

sudo systemctl status krb5-kdc krb5-admin-server --no-pager

Look at logs (Debian uses journalctl instead of flat log files):

sudo journalctl -u krb5-kdc -u krb5-admin-server -b --no-pager

Verbose kinit to see exactly what’s happening:

KRB5_TRACE=/dev/stderr kinit -V michael

This will show you which hostnames it resolves, which tickets it requests, and where it fails.

List all principals in the KDC database:

sudo kadmin.local -q "listprincs"

Clear your credential cache if tickets get stale:

kdestroy

The two most common pitfalls are:

  • Hostname mismatch
  • Realm mismatch (default realm not set in /etc/krb5.conf).

SSO

So, we’ve got the proof of concept going, but it would be good to see this in action. What we’ll cover in this next section is getting the sshd service to trust our Kerberos tickets. This will allow for passwordless SSH for the user.

Add the host service principal and keytab

In order to get KDC to vouch for services, those services need principal definitions. A principal is any Kerberos identity. Users get user principals (as we saw above), services also need principals.

sudo kadmin.local -q "addprinc -randkey host/koffing.local"

For SSH on my virtual machine koffing.local, the conventional name is:

host/koffing.local@LOCAL
  • The host/ prefix is the standard for SSH, rsh, and other “host-based” services.
  • The FQDN (koffing.local) must match what the client thinks it is connecting to.
  • @LOCAL is your realm.

When a client does ssh michael@koffing.local, the SSH server needs to prove “I really am host/koffing.local, trusted by the KDC.”

Now we need a keytab.

sudo kadmin.local -q "ktadd -k /etc/krb5.keytab host/koffing.local"

A keytab is a file that stores one or more Kerberos keys (like passwords, but in cryptographic form). Unlike users (who can type passwords into kinit), services can’t type passwords interactively. So the KDC generates a random key for host/koffing.local@LOCAL (-randkey) and you export it into /etc/krb5.keytab with ktadd.

Now sshd can silently use that keytab to decrypt tickets clients send it.

Enable GSSAPI in sshd

The global /etc/ssh/sshd_config needs a couple of flags flicked. The SSH daemon doesn’t implement Kerberos directly, so it uses the GSSAPI library functions provided by MIT Kerberos (or Heimdal) to handle ticket validation. GSSAPI isn’t a protocol itself; it’s an API or abstraction layer.

Once we’ve flipped these switches we are telling sshd “Accept authentication from any GSSAPI mechanism. In practice, this means Kerberos tickets.”.

# GSSAPI options
GSSAPIAuthentication yes
GSSAPICleanupCredentials yes

This setup is obviously done on any server that you want to do this SSO style login with. It’s a bit confusing in my example here, because everything is on the one machine.

Configure your SSH client

Conversely, we have configuration to do on the client side. For clients that want to connect with this type of authentication, the following settings are required in their ~/.ssh/config:

Host koffing.local
  GSSAPIAuthentication yes
  GSSAPIDelegateCredentials yes

Testing

kdestroy
kinit michael
ssh michael@koffing.local

If everything lines up, ssh should not prompt for a password. Your Kerberos TGT has been used to authenticate silently.

Where Kerberos Fits

Kerberos is ideal for LAN-based authentication: it provides fast, passwordless single sign-on for services like SSH, Postgres, and intranet HTTP apps. But it isn’t designed for cross-organization web or mobile use.

Modern protocols like OIDC (OpenID Connect) build on OAuth 2.0 to provide authentication and federation across the public internet. They use signed tokens, redirect flows, and JSON-based metadata — making them better suited for SaaS, cloud apps, and mobile clients.

In short: Kerberos is the right tool inside the castle walls; OIDC is the right tool when your users are everywhere.

Wrap-up

We’ve stood up a Kerberos realm (LOCAL), issued a TGT for a user (michael), and used it for passwordless SSH into the same box. That’s enough to demystify Kerberos: no secrets flying across the network, just short-lived tickets granted by a trusted KDC.

There’s plenty more that we can accomplish here as we could create service principals for HTTP, Postgres, or cross-realm trust.

Hello, Jail: A Quick Introduction to FreeBSD Jails

FreeBSD Jails are one of the earliest implementations of operating system-level virtualization—dating back to the early 2000s, long before Docker popularized the idea of lightweight containers. Despite their age, jails remain a powerful, flexible, and minimal way to isolate services and processes on FreeBSD systems.

This post walks through a minimal “Hello World” setup using Jails, with just enough commentary to orient new users and show where jails shine in the modern world of virtualization.

Why Jails?

A FreeBSD jail is a chroot-like environment with its own file system, users, network interfaces, and process table. But unlike chroot, jails extend control to include process isolation, network access, and fine-grained permission control. They’re more secure, more flexible, and more deeply integrated into the FreeBSD base system.

Here’s how jails compare with some familiar alternatives:

  • Versus VMs: Jails don’t emulate hardware or run separate kernels. They’re faster to start, lighter on resources, and simpler to manage. But they’re limited to the same FreeBSD kernel as the host.
  • Versus Docker: Docker containers typically run on a Linux host and rely on a container runtime, layered filesystems, and extensive tooling. Jails are simpler, arguably more robust, and don’t require external daemons. However, they lack some of the ecosystem and portability benefits that Docker brings.

If you’re already running FreeBSD and want to isolate services or test systems with minimal overhead, jails are a perfect fit.

Setup

Let’s build a bare-bones jail. The goal here is simplicity: get a jail running with minimal commands. This is the BSD jail equivalent of “Hello, World.”

# Make a directory to hold the jail
mkdir hw

# Install a minimal FreeBSD userland into that directory
sudo bsdinstall jail /home/michael/src/jails/hw

# Start the jail with a name, IP address, and a shell
sudo jail -c name=hw host.hostname=hw.example.org \
    ip4.addr=192.168.1.190 \
    path=/home/michael/src/jails/hw \
    command=/bin/sh

You now have a running jail named hw, with a hostname and IP, running a shell isolated from the host system.

192.168.1.190 is just a static address picked arbitrarily by me. For you, you’ll want to pick an address that is reachable on your local network.

Poking Around

With your jail up and running, that means you can start working with it. To enter the jail, you can use the following:

sudo jexec hw /bin/sh

jexec allows you to send any command that you need to into the jail to execute.

sudo jexec hw ls /

Querying

You can list running jails with:

jls

You should see something like this:

JID  IP Address      Hostname                      Path
2    192.168.1.190   hw.example.org                /home/michael/src/jails/hw

You can also look at what’s currently running in the jail:

ps -J hw

You should see the /bin/sh process:

PID TT  STAT    TIME COMMAND
2390  5  I+J  0:00.01 /bin/sh

Finishing up

To terminate the jail:

sudo jail -r hw

This is a minimal setup with no automated networking, no jail management frameworks, and no persistent configuration. And that’s exactly the point: you can get a working jail in three commands and tear it down just as easily.

When to Use Jails

Jails make sense when:

  • You want process and network isolation on FreeBSD without the overhead of full VMs.
  • You want to run multiple versions of a service (e.g., Postgres 13 and 15) on the same host.
  • You want stronger guarantees than chroot provides for service containment.
  • You’re building or testing FreeBSD-based systems and want a reproducible sandbox.

For more complex jail setups, FreeBSD offers tools like ezjail, iocage, and bastille that add automation and persistence. But it’s worth knowing how the pieces fit together at the core.

Conclusion

FreeBSD jails offer a uniquely minimal, powerful, and mature alternative to both VMs and containers. With just a few commands, you can create a secure, isolated environment for experimentation, testing, or even production workloads.

This post only scratched the surface, but hopefully it’s enough to get you curious. If you’re already on FreeBSD, jails are just sitting there, waiting to be used—no extra software required.