Cogs and Levers A blog full of technical stuff

Create your own Filesystem with FUSE

Introduction

FUSE is a powerful Linux kernel module that lets you implement your own filesystems entirely in user space. No kernel hacking required. With it, building your own virtual filesystem becomes surprisingly achievable and even… fun.

In today’s article, we’ll build a filesystem that’s powered entirely by HTTP. Every file operation — reading a file, listing a directory, even getting file metadata — will be handled by a REST API. On the client side, we’ll use libcurl to perform HTTP calls from C, and on the server side, a simple Python Flask app will serve as our in-memory file store.

Along the way, you’ll learn how to:

  • Use FUSE to handle filesystem operations in user space
  • Make REST calls from C using libcurl
  • Create a minimal RESTful backend for serving file content
  • Mount and interact with your filesystem like any other directory

Up in my github repository I have added this project if you’d like to pull it down and try it. It’s called restfs.

Let’s get into it.

Defining a FUSE Filesystem

Every FUSE-based filesystem starts with a fuse_operations struct. This is essentially a table of function pointers — you provide implementations for the operations you want your filesystem to support.

Here’s the one used in restfs:

static struct fuse_operations restfs_ops = {
    .getattr = restfs_getattr,
    .readdir = restfs_readdir,
    .open    = restfs_open,
    .read    = restfs_read
};

This tells FUSE: “When someone calls stat() on a file, use restfs_getattr. When they list a directory, use restfs_readdir, and so on.”

Let’s break these down:

  • getattr: Fills in a struct stat with metadata about a file or directory — size, mode, timestamps, etc. It’s the equivalent of stat(2).
  • readdir: Lists the contents of a directory. It’s how ls knows what to show.
  • open: Verifies that a file can be opened. You don’t need to return a file descriptor — just confirm the file exists and is readable.
  • read: Reads data from a file into a buffer. This is where the real I/O happens.

Each function corresponds to a familiar POSIX operation. For this demo, we’re implementing just the basics — enough to mount the FS, ls it, and cat a file.

If you leave an operation out, FUSE assumes it’s unsupported — for example, we haven’t implemented write, mkdir, or unlink, so the filesystem will be effectively read-only.

Making REST Calls from C with libcurl

To interact with our HTTP-based server, we use libcurl, a powerful and flexible HTTP client library for C. In restfs, we wrap libcurl in a helper function called http_io() that performs an HTTP request and returns a parsed response object.

Here’s the core of the function:

struct _rest_response* http_io(const char *url, const char *body, const char *type) {
   CURL *curl = NULL;
   CURLcode res;
   long status = 0L;

   struct _http_write_buffer buf;
   buf.data = malloc(1);
   buf.size = 0;

   curl = curl_easy_init();

   if (curl) {
      curl_easy_setopt(curl, CURLOPT_URL, url);
      curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, type);

      if (body) {
         curl_easy_setopt(curl, CURLOPT_POSTFIELDS, body);
         curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, strlen(body));
      }

      curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, http_write_callback);
      curl_easy_setopt(curl, CURLOPT_WRITEDATA, (void *)&buf);

      curl_easy_setopt(curl, CURLOPT_USERAGENT, _http_user_agent);

      struct curl_slist *headers = NULL;
      headers = curl_slist_append(headers, "Content-Type: application/json");
      curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);

      res = curl_easy_perform(curl);
      curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &status);
      curl_easy_cleanup(curl);
      curl_slist_free_all(headers);

      if (res != CURLE_OK) {
         fprintf(stderr, "error: %s\n", curl_easy_strerror(res));
         if (buf.data) free(buf.data);
         return NULL;
      }
   }

   return rest_make_response(buf.data, buf.size, status);
}

Let’s break it down:

  • curl_easy_init() creates a new easy handle.
  • CURLOPT_URL sets the request URL.
  • CURLOPT_CUSTOMREQUEST lets us specify GET, POST, PUT, DELETE, etc.
  • If a body is provided (e.g. for POST/PUT), we pass it in using CURLOPT_POSTFIELDS.
  • CURLOPT_WRITEFUNCTION and CURLOPT_WRITEDATA capture the server’s response into a buffer.
  • Headers are added manually to indicate we’re sending/expecting JSON.
  • After the call, we extract the HTTP status code and clean up.

The result is returned as a _rest_response struct:

struct _rest_response {
   int status;
   json_object *json;
   char *data;     // raw response body
   size_t length;  // response size in bytes
};

This makes it easy to access either the full raw data or a parsed JSON object depending on the use case.

To parse the JSON responses from the server, we use the json-c library — a lightweight and widely used C library for working with JSON data. This allows us to easily extract fields like st_mode, st_size, or timestamps directly from the server’s responses.

To simplify calling common HTTP methods, we define a few handy macros:

#define rest_get(uri)         http_io(uri, NULL, "GET")
#define rest_delete(uri)      http_io(uri, NULL, "DELETE")
#define rest_post(uri, body)  http_io(uri, body, "POST")
#define rest_put(uri, body)   http_io(uri, body, "PUT")

With these in place, calling a REST endpoint is as simple as:

struct _rest_response *res = rest_get("/getattr?path=/hello.txt");

This layer abstracts away the curl boilerplate so each FUSE handler can focus on interpreting the result.

The Backend

So far we’ve focused on the FUSE client — how file operations are translated into HTTP requests. But for the system to work, we need something on the other side of the wire to respond.

Enter: a minimal Python server built with Flask.

This server acts as a fake in-memory filesystem. It knows nothing about actual disk files — it just stores a few predefined paths and returns metadata and file contents in response to requests.

Let’s look at the key parts:

  • A Python dictionary (fs) holds a small set of files and their byte contents.
  • The /getattr endpoint returns a JSON version of struct stat for a given file path.
  • The /readdir endpoint lists all available files (we only support the root directory).
  • The /read endpoint returns a slice of the file contents, based on offset and size.

Here’s a simplified version of the server:

from flask import Flask, request, jsonify
from urllib.parse import unquote
import os, stat, time

app = Flask(__name__)
fs = { '/hello.txt': b"Hello, RESTFS!\\n" }

def now(): return { "tv_sec": int(time.time()), "tv_nsec": 0 }

@app.route('/getattr')
def getattr():
    path = unquote(request.args.get('path', ''))
    if path == "/":
        return jsonify({ "st_mode": stat.S_IFDIR | 0o755, ... })
    if path in fs:
        return jsonify({ "st_mode": stat.S_IFREG | 0o644, "st_size": len(fs[path]), ... })
    return ('Not Found', 404)

@app.route('/readdir')
def readdir():
    return jsonify([name[1:] for name in fs.keys()])  # ['hello.txt']

@app.route('/read')
def read():
    path = request.args.get('path')
    offset = int(request.args.get('offset', 0))
    size = int(request.args.get('size', 4096))
    return fs[path][offset:offset+size]

This is enough to make ls and cat work on the mounted filesystem. The client calls getattr and readdir to explore the directory, and uses read to pull down bytes from the file.

End to End

With the server running and the client compiled, we can now bring it all together.

Start the Flask server in one terminal:

python server.py

Then, in another terminal, create a mountpoint and run the restfs client:

mkdir /tmp/restmnt
./restfs --base http://localhost:5000/ /tmp/restmnt -f

Now try interacting with your mounted filesystem just like any other directory:

➜  restmnt ls -l
total 1
-rw-r--r-- 1 michael michael  6 Jan  1  1970 data.bin
-rw-r--r-- 1 michael michael 15 Jan  1  1970 hello.txt

➜  restmnt cat hello.txt
Hello, RESTFS!

You should see logs from the server indicating incoming requests:

[GETATTR] path=/
127.0.0.1 - - [18/Aug/2025 21:29:46] "GET /getattr?path=/ HTTP/1.1" 200 -
[READDIR] path=/
127.0.0.1 - - [18/Aug/2025 21:29:46] "GET /readdir?path=/ HTTP/1.1" 200 -
[GETATTR] path=/hello.txt
127.0.0.1 - - [18/Aug/2025 21:29:46] "GET /getattr?path=/hello.txt HTTP/1.1" 200 -
127.0.0.1 - - [18/Aug/2025 21:29:47] "GET /open?path=/hello.txt HTTP/1.1" 200 -
127.0.0.1 - - [18/Aug/2025 21:29:47] "GET /read?path=/hello.txt&offset=0&size=4096 HTTP/1.1" 200 -
[GETATTR] path=/
127.0.0.1 - - [18/Aug/2025 21:29:47] "GET /getattr?path=/ HTTP/1.1" 200 -

Under the hood, every file operation is being translated into a REST call, logged by the Flask server, and fulfilled by your in-memory dictionary.

This is where the whole thing becomes delightfully real — you’ve mounted an HTTP API as if it were a native part of your filesystem.

Conclusion

restfs is a fun and minimal example of what FUSE can unlock — filesystems that aren’t really filesystems at all. Instead of reading from disk, we’re routing every file operation over HTTP, backed by a tiny REST server.

While this project is intentionally lightweight and a bit absurd, the underlying ideas are surprisingly practical. FUSE is widely used for things like encrypted filesystems, network mounts, and user-space views over application state. And libcurl remains a workhorse for robust HTTP communication in C programs.

What you’ve seen here is just the start. You could extend restfs to support writing files, persisting data to disk, mounting a remote object store, or even representing entirely virtual data (like logs, metrics, or debug views).

Sometimes the best way to understand a system is to reinvent it — badly, on purpose.

Time Integration in Physics Simulations

Introduction

When simulating physical systems—whether it’s a bouncing ball, orbiting planets, or particles under gravity—accurately updating positions and velocities over time is crucial. This process is known as time integration, and it’s the backbone of most game physics and real-time simulations.

In this post, we’ll explore two fundamental methods for time integration: Euler’s method and Runge-Kutta 4 (RK4).

We’ll go through how each of these methods is represented mathemtically, and then we’ll translate that into code. We’ll build a small visual simulation in Python using pygame to see how the two methods behave differently when applied to the same system.

The Simulation

Our simulation consists of a central massive object (a “sun”) and several orbiting bodies, similar to a simplified solar system. Each body is influenced by the gravitational pull of the others, and we update their positions and velocities in each frame of the simulation loop.

At the heart of this simulation lies a decision: how should we advance these objects forward in time? This is where the integration method comes in.

Euler’s Method

Euler’s method is the simplest way to update motion over time. It uses the current velocity to update position, and the current acceleration to update velocity:

\[\begin{aligned} \vec{x}_{t+\Delta t} &= \vec{x}_t + \vec{v}_t \cdot \Delta t \\\\ \vec{v}_{t+\Delta t} &= \vec{v}_t + \vec{a}_t \cdot \Delta t \end{aligned}\]

This translates down into the following python code:

def step_euler(bodies):
    accs = [compute_acc(bodies, i) for i in range(len(bodies))]
    for b, a in zip(bodies, accs):
        b.pos += b.vel * DT
        b.vel += a * DT

This is easy to implement, but has a major downside: error accumulates quickly, especially in systems with strong forces or rapidly changing directions.

Here’s an example of it running:

RK4

Runge-Kutta 4 (RK4) improves on Euler by sampling the system at multiple points within a single timestep. It estimates what will happen halfway through the step, not just at the beginning. This gives a much better approximation of curved motion and reduces numerical instability.

Runge-Kutta 4 samples the derivative at four points:

\[\begin{aligned} \vec{k}_1 &= f(t, \vec{y}) \\\\ \vec{k}_2 &= f\left(t + \frac{\Delta t}{2}, \vec{y} + \frac{\vec{k}_1 \cdot \Delta t}{2}\right) \\\\ \vec{k}_3 &= f\left(t + \frac{\Delta t}{2}, \vec{y} + \frac{\vec{k}_2 \cdot \Delta t}{2}\right) \\\\ \vec{k}_4 &= f\left(t + \Delta t, \vec{y} + \vec{k}_3 \cdot \Delta t\right) \\\\ \vec{y}_{t+\Delta t} &= \vec{y}_t + \frac{\Delta t}{6}(\vec{k}_1 + 2\vec{k}_2 + 2\vec{k}_3 + \vec{k}_4) \end{aligned}\]
Combined State! The y vector here represents both position and velocity as a combined state vector.

This translates down into the following python code:

def step_rk4(bodies):
    n = len(bodies)
    pos0 = [b.pos.copy() for b in bodies]
    vel0 = [b.vel.copy() for b in bodies]

    a1 = [compute_acc(bodies, i) for i in range(n)]

    for i, b in enumerate(bodies):
        b.pos = pos0[i] + vel0[i] * (DT / 2)
        b.vel = vel0[i] + a1[i] * (DT / 2)
    a2 = [compute_acc(bodies, i) for i in range(n)]

    for i, b in enumerate(bodies):
        b.pos = pos0[i] + b.vel * (DT / 2)
        b.vel = vel0[i] + a2[i] * (DT / 2)
    a3 = [compute_acc(bodies, i) for i in range(n)]

    for i, b in enumerate(bodies):
        b.pos = pos0[i] + b.vel * DT
        b.vel = vel0[i] + a3[i] * DT
    a4 = [compute_acc(bodies, i) for i in range(n)]

    for i, b in enumerate(bodies):
        b.pos = pos0[i] + vel0[i] * DT + (DT**2 / 6) * (a1[i] + 2*a2[i] + 2*a3[i] + a4[i])
        b.vel = vel0[i] + (DT / 6) * (a1[i] + 2*a2[i] + 2*a3[i] + a4[i])

RK4 requires more code and computation, but the visual payoff is immediately clear: smoother orbits, fewer explosions, and longer-lasting simulations.

Here’s an example of it running:

Trade-offs

Euler is fast and simple. It’s great for prototyping, simple games, or systems where precision isn’t critical.

RK4 is more accurate and stable, especially in chaotic or sensitive systems—but it’s computationally more expensive. In real-time applications (like games), you’ll need to weigh performance vs. quality.

Also, both methods depend heavily on the size of the timestep. Larger steps amplify error; smaller ones improve accuracy at the cost of performance.

Conclusion

Switching from Euler to RK4 doesn’t just mean writing more code—it fundamentally changes how your simulation evolves over time. If you’re seeing odd behaviors like spiraling orbits, exploding systems, or jittery motion, trying a higher-order integrator like RK4 might fix it.

Or, it might inspire a deeper dive into the world of numerical simulation—welcome to the rabbit hole!

You can find the full code listing here as a gist, so you can tweak and run it for yourself.

Getting Started with ClojureScript

Introduction

I recently decided to dip my toes into ClojureScript. As someone who enjoys exploring different language ecosystems, I figured getting a basic “Hello, World!” running in the browser would be a fun starting point. It turns out that even this small journey taught me quite a bit about how ClojureScript projects are wired together.

This post captures my first successful setup: a minimal ClojureScript app compiled with lein-cljsbuild, rendering output in the browser console.

A Rough Start

I began with the following command to create a new, blank project:

lein new cljtest

First job from here is to organise dependencies, and configure the build system for the project.

project.clj

There’s a few things to understand in the configuration of the project:

  • We add org.clojure/clojurescript "1.11.132" as a dependency
  • To assist with our builds, we add the plugin lein-cljsbuild "1.1.8"
  • The source path is normally src, but we change this for ClojureScript to src-cljs
  • The output will be javascript output for a website, and all of our web assets go into resources/public
(defproject cljtest "0.1.0-SNAPSHOT"
  :min-lein-version "2.9.1"
  :description "Minimal ClojureScript Hello World"
  :dependencies [[org.clojure/clojure "1.11.1"]
                 [org.clojure/clojurescript "1.11.132"]]
  :plugins [[lein-cljsbuild "1.1.8"]]
  :source-paths ["src-cljs"]
  :clean-targets ^{:protect false} ["resources/public/js" "target"]

  :cljsbuild
  {:builds
   {:dev
    {:source-paths ["src-cljs"]
     :compiler {:main cljtest.core
                :output-to "resources/public/js/main.js"
                :output-dir "resources/public/js/out"
                :asset-path "js/out"
                :optimizations :none
                :source-map true
                :pretty-print true}}

    :prod
    {:source-paths ["src-cljs"]
     :compiler {:main cljtest.core
                :output-to "resources/public/js/main.js"
                :optimizations :advanced
                :pretty-print false}}}})

We have two different build configurations here: dev and prod.

The dev configuration focuses on being much quicker to build so that the change / update cycle during development is quicker. Source maps, pretty printing, and no optimisations provide the verbose output appropriate for debugging.

The prod configuration applies all the optimisations. This build is slower, but produces one single output file: main.js. This is the configuration that you use to “ship” your application.

Your First ClojureScript File

Place this in src-cljs/cljtest/core.cljs:

(ns cljtest.core)

(enable-console-print!)
(println "Hello from ClojureScript!")

HTML Page to Load It

Create a file at resources/public/index.html:

<!doctype html>
<html>
  <head><meta charset="utf-8"><title>cljtest</title></head>
  <body>
    <h1>cljtest</h1>
    <script src="js/out/goog/base.js"></script>
    <script src="js/main.js"></script>
    <script>goog.require('cljtest.core');</script>
  </body>
</html>

Build & Run

Compile your dev build:

lein clean
lein cljsbuild once dev

Then open resources/public/index.html in your browser, and check the developer console — you should see your message.

If you want to iterate while coding:

lein cljsbuild auto dev

When you’re ready to build a production bundle:

lein cljsbuild once prod

Then you can simplify the HTML:

<script src="js/main.js"></script>

No goog.require needed — it all gets bundled.

Step it up

Next, we’ll step up to something a little more useful. We’ll put together a table of names that we can add, edit, delete, etc. Just a really simple CRUD style application.

In order to do this, we’re going to rely on a pretty cool library called reagent.

We add the following dependency to project.clj:

[reagent "1.0.0"]

State

Our little application requires some state:

(defonce names (r/atom [{:id 1 :name "Alice"}
                        {:id 2 :name "Bob"}]))

(defonce next-id (r/atom 3))
(defonce editing-id (r/atom nil))
(defonce edit-text (r/atom ""))

names is the currentl list of names. next-id gives us the next value that we’ll use an ID when adding a new record. editing-id and edit-text manage the state for updates.

Table

We can now render our table using a simple function:

(defn name-table []
  [:div
   [:h2 "Name Table"]
   [:table
    [:thead
     [:tr [:th "Name"] [:th "Edit"] [:th "Delete"]]]
    [:tbody
     (for [n @names]
       ^{:key (:id n)} [name-row n])]]
   [:div
    [:input {:placeholder "New name"
             :value @edit-text
             :on-change #(reset! edit-text (.. % -target -value))}]
    [:button {:on-click
              #(when-not (clojure.string/blank? @edit-text)
                 (swap! names conj {:id @next-id :name @edit-text})
                 (swap! next-id inc)
                 (reset! edit-text ""))}
     "Add"]]])

The table renders all of the names, as well and handles the create case. The edit case is a little more complex and requires a function of its own. The name-row function manages this complexity for us.

(defn name-row [{:keys [id name]}]
  [:tr
   [:td name]
   [:td
    (if (= id @editing-id)
      [:<>
       [:input {:value @edit-text
                :on-change #(reset! edit-text (.. % -target -value))}]
       [:button {:on-click
                 (fn []
                   (swap! names (fn [ns]
                                  (mapv (fn [n]
                                          (if (= (:id n) id)
                                            (assoc n :name @edit-text)
                                            n))
                                        ns)))
                   (reset! editing-id nil))}
        "Save"]]
      [:<>
       [:button {:on-click #(do (reset! editing-id id)
                                (reset! edit-text name))}
        "Edit"]])]
   [:td
    [:button {:on-click
              (fn []
                (swap! names (fn [ns]
                               (vec (remove (fn [n] (= (:id n) id)) ns)))))} ;; FIX
     "Delete"]]])

Mounting!

Now we’re going to make sure that these functions end up on our web page.

(defn mount-root []
  (dom/render [name-table] (.getElementById js/document "app")))

(defn init []
  (enable-console-print!)
  (mount-root))

We need an app element in our HTML page.

<!doctype html>
<html>
  <head>
    <meta charset="utf-8">
    <title>cljtest</title>
  </head>
  <body>
    <h1>cljtest</h1>

    <!-- This is our new element! -->
    <div id="app"></div>

    <script src="js/out/goog/base.js"></script>
    <script src="js/main.js"></script>
    <script>goog.require('cljtest.core'); cljtest.core.init();</script>

  </body>
</html>

Conclusion

This journey started with a humble goal: get a simple ClojureScript app running in the browser. Along the way, I tripped over version mismatches, namespace assumptions, and nested anonymous functions — but I also discovered the elegance of Reagent and the power of functional UIs in ClojureScript.

While the setup using lein-cljsbuild and Reagent 1.0.0 may feel a bit dated, it’s still a solid way to learn the fundamentals. From here, I’m looking forward to exploring more advanced tooling like Shadow CLJS, integrating external JavaScript libraries, and building more interactive UIs.

This was my first real toe-dip into ClojureScript, and already I’m hooked. Stay tuned — there’s more to come.

Understanding the Transformer Architecture

Introduction

Natural language processing (NLP) has gone through several paradigm shifts:

  • Bag-of-Words — treated text as unordered word counts; no sequence information. We’ve spoken about this previously.
  • Word Embeddings (word2vec, GloVe) — learned fixed-vector representations that captured meaning. We’ve looked at these previously.
  • RNNs, LSTMs, GRUs — processed sequences token-by-token, retaining a hidden state; struggled with long-range dependencies due to vanishing gradients.
  • Seq2Seq with Attention — attention helped the model “focus” on relevant input tokens; a leap in translation and summarization.
  • Transformers (Vaswani et al., 2017 — “Attention Is All You Need”) — replaced recurrence entirely with self-attention, allowing parallelization and longer context handling.

Transformers didn’t just improve accuracy; they unlocked the ability to scale models massively.

In this post, we’ll walk though an understanding of the transformer architecture by implementing a GPT-style Transformer from scratch in PyTorch, from tokenization to text generation.

The goal: make the architecture concrete and understandable, not magical.

Overview

At a high level, our model will:

  1. Tokenize text into integers.
  2. Map tokens to dense embeddings + positional encodings.
  3. Apply self-attention to mix contextual information.
  4. Use feed-forward networks for per-token transformations.
  5. Wrap attention + FFN in Transformer Blocks with residual connections and layer normalization.
  6. Project back to vocabulary logits.
  7. Generate text autoregressively.
graph TD A[Text Input] --> B[Tokenizer] B --> C[Token Embeddings + Positional Encoding] C --> D[Transformer Block × N] D --> E[Linear Projection to Vocabulary Size] E --> F[Softmax Probabilities] F --> G[Sample / Argmax Next Token] G -->|Loop| C

Tokenization

Before our model can process text, we need to turn characters into numbers it can work with — a process called tokenization. In this example, we use a simple byte-level tokenizer, which treats every UTF-8 byte as its own token. This keeps the implementation minimal while still being able to represent any possible text without building a custom vocabulary.

class ByteTokenizer:
    """
    UTF-8 bytes <-> ints in [0..255].
    NOTE: For production models you'd use a subword tokenizer (BPE, SentencePiece).
    """
    def __init__(self) -> None:
        self.vocab_size = 256

    def encode(self, text: str) -> list[int]:
        return list(text.encode("utf-8"))

    def decode(self, ids: list[int]) -> str:
        return bytes(ids).decode("utf-8", errors="ignore")

Example:

tok = ByteTokenizer()
ids = tok.encode("Hello")
print(ids)        # [72, 101, 108, 108, 111]
print(tok.decode(ids))  # "Hello"

Embeddings & Positional Encoding

Once we have token IDs, we map them into embedding vectors — learned dense representations that capture meaning in a continuous space. Each token ID indexes a row in an embedding matrix, turning a discrete integer into a trainable vector of size \(d_{\text{model}}\). Because self-attention alone has no sense of order, we also add positional embeddings, giving the model information about each token’s position within the sequence.

self.tok_emb = nn.Embedding(vocab_size, d_model)   # token embeddings
self.pos_emb = nn.Embedding(block_size, d_model)   # positional embeddings

Self-Attention

Self-attention lets each token attend to all previous tokens (causally masked to prevent peeking ahead).

Mathematically:

\[\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^\top}{\sqrt{d_k}}\right) V\]

That equation means each token computes a similarity score with all other tokens (via \(QK^\top\)), scales it by \(\sqrt{d_k}\) to stabilize gradients, turns the scores into probabilities with softmax, and then uses those probabilities to take a weighted sum of the value vectors \(V\) to produce its new representation.

Multi-head attention runs this in parallel on different projections.

class MultiHeadSelfAttention(nn.Module):
    def __init__(self, d_model, n_heads, block_size, dropout):
        super().__init__()
        assert d_model % n_heads == 0
        self.n_heads = n_heads
        self.head_dim = d_model // n_heads
        self.qkv = nn.Linear(d_model, 3 * d_model, bias=False)
        self.out_proj = nn.Linear(d_model, d_model, bias=False)
        self.attn_drop = nn.Dropout(dropout)
        self.resid_drop = nn.Dropout(dropout)
        mask = torch.tril(torch.ones(block_size, block_size, dtype=torch.bool))
        self.register_buffer("causal_mask", mask)

    def forward(self, x):
        B, T, C = x.shape
        qkv = self.qkv(x)
        q, k, v = qkv.chunk(3, dim=-1)
        def split_heads(t): return t.view(B, T, self.n_heads, self.head_dim).transpose(1, 2)
        q, k, v = split_heads(q), split_heads(k), split_heads(v)
        scores = (q @ k.transpose(-2, -1)) / math.sqrt(self.head_dim)
        scores = scores.masked_fill(~self.causal_mask[:T, :T], float("-inf"))
        att = F.softmax(scores, dim=-1)
        att = self.attn_drop(att)
        y = att @ v
        y = y.transpose(1, 2).contiguous().view(B, T, C)
        y = self.out_proj(y)
        y = self.resid_drop(y)
        return y

Feed-Forward Network

A per-token MLP, applied identically at each position.

class FeedForward(nn.Module):
    def __init__(self, d_model, mult=4, dropout=0.0):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d_model, mult * d_model),
            nn.GELU(),
            nn.Linear(mult * d_model, d_model),
            nn.Dropout(dropout),
        )

    def forward(self, x):
        return self.net(x)

This tiny two-layer neural network can be broken down as follows:

  • Input: token embedding vector (size \(d_{\text{model}}\)).
  • Linear layer: expands to \(\text{mult} \times d_{\text{model}}\).
  • GELU activation: introduces non-linearity.
  • Linear layer: projects back to \(d_{\text{model}}\).
  • Dropout: randomly zeroes some activations during training for regularization.

Transformer Block

A Transformer block applies pre-layer normalization, then runs the data through either a multi-head self-attention layer or a feed-forward network (FFN), and adds a residual connection after each. This structure is stacked multiple times to deepen the model.

graph TD A[Input] --> B[LayerNorm] B --> C[Multi-Head Self-Attention] C --> D[Residual Add] D --> E[LayerNorm] E --> F[Feed-Forward Network] F --> G[Residual Add] G --> H[Output to Next Block]
class TransformerBlock(nn.Module):
    def __init__(self, d_model, n_heads, block_size, dropout):
        super().__init__()
        self.ln1 = nn.LayerNorm(d_model)
        self.ln2 = nn.LayerNorm(d_model)
        self.attn = MultiHeadSelfAttention(d_model, n_heads, block_size, dropout)
        self.ffn  = FeedForward(d_model, mult=4, dropout=dropout)

    def forward(self, x):
        x = x + self.attn(self.ln1(x))
        x = x + self.ffn(self.ln2(x))
        return x

GPT-Style Model Head & Loss

After token and position embeddings are summed, the data flows through a stack of Transformer blocks, each applying self-attention and a feed-forward transformation with residual connections.
Once all blocks have run, we apply a final LayerNorm to normalize the hidden state vectors and keep training stable.

From there, each token’s hidden vector is projected back into vocabulary space — producing a vector of raw scores (logits) for each possible token in the vocabulary.

We also use weight tying here: the projection matrix for mapping hidden vectors to logits is the same matrix as the token embedding layer’s weights.
This reduces the number of parameters, ensures a consistent mapping between tokens and embeddings, and has been shown to improve generalization.

Mathematically, weight tying can be expressed as:

\[\text{logits} = H \cdot E^\top\]

where \(H\) is the matrix of hidden states from the final Transformer layer, and \(E\) is the embedding matrix from the input token embedding layer. This means the output projection reuses (shares) the same weights as the input embedding, just transposed.

class TinyGPT(nn.Module):
    def __init__(self, vocab_size, d_model=128, n_layers=2, n_heads=4, block_size=64, dropout=0.1):
        super().__init__()
        self.block_size = block_size
        self.tok_emb = nn.Embedding(vocab_size, d_model)
        self.pos_emb = nn.Embedding(block_size, d_model)
        self.drop = nn.Dropout(dropout)
        self.blocks = nn.ModuleList([
            TransformerBlock(d_model, n_heads, block_size, dropout)
            for _ in range(n_layers)
        ])
        self.ln_f = nn.LayerNorm(d_model)
        self.head = nn.Linear(d_model, vocab_size, bias=False)
        self.head.weight = self.tok_emb.weight
        self.apply(self._init_weights)

    def _init_weights(self, m):
        if isinstance(m, nn.Linear):
            nn.init.normal_(m.weight, mean=0.0, std=0.02)
            if m.bias is not None: nn.init.zeros_(m.bias)
        elif isinstance(m, nn.Embedding):
            nn.init.normal_(m.weight, mean=0.0, std=0.02)

    def forward(self, idx, targets=None):
        B, T = idx.shape
        assert T <= self.block_size
        tok = self.tok_emb(idx)
        pos = self.pos_emb(torch.arange(T, device=idx.device))
        x = self.drop(tok + pos)
        for blk in self.blocks:
            x = blk(x)
        x = self.ln_f(x)
        logits = self.head(x)
        loss = None
        if targets is not None:
            loss = F.cross_entropy(
                logits.view(B * T, -1),
                targets.view(B * T)
            )
        return logits, loss

Generation Loop

This method performs autoregressive text generation: we start with some initial tokens, repeatedly predict the next token, append it, and feed the result back into the model.

Key concepts:

  • Autoregressive: generation proceeds one token at a time, conditioning on all tokens so far.
  • Temperature: scales the logits before softmax; values < 1.0 make predictions sharper/more confident, > 1.0 make them more random.
  • Top-k filtering: keeps only the k highest-probability tokens and sets all others to negative infinity before sampling, which limits randomness to plausible options.

Step-by-step in generate():

  1. Crop context: keep only the last block_size tokens to match the model’s maximum context window.
  2. Forward pass: get logits for each position in the sequence.
  3. Select last step’s logits: we only want the prediction for the next token.
  4. Adjust for temperature (optional).
  5. Apply top-k filtering (optional).
  6. Softmax: convert logits into a probability distribution.
  7. Sample: randomly choose the next token according to the probabilities.
  8. Append: add the new token to the sequence and repeat.

This loop continues until max_new_tokens tokens have been generated.

@torch.no_grad()
def generate(self, idx, max_new_tokens, temperature=1.0, top_k=None):
    for _ in range(max_new_tokens):
        idx_cond = idx[:, -self.block_size:]
        logits, _ = self(idx_cond)
        logits = logits[:, -1, :]
        if temperature != 1.0:
            logits = logits / temperature
        if top_k is not None:
            v, _ = torch.topk(logits, top_k)
            thresh = v[:, [-1]]
            logits = torch.where(logits < thresh, torch.full_like(logits, float("-inf")), logits)
        probs = F.softmax(logits, dim=-1)
        next_id = torch.multinomial(probs, num_samples=1)
        idx = torch.cat([idx, next_id], dim=1)
    return idx

In Practice

That concludes the entire stack that we need. We can start to ask questions of this very basic model. Just remember, this is a tiny model so results are not going to be amazing, but it will give you a sense of how these tokens are generated.

After training briefly on a small excerpt of Moby Dick plus a few Q/A lines, we can get:

Q: Why does he go to sea?
A: To drive off the spleen and regulate the circulation.

Even a tiny model learns local structure.

Conclusion

Even though this isn’t the perfect model that will challenge all of the big guys, I hope this has been a bit of a step by step walkthough on how the transformer architecture is put together.

A full version of the code referenced in this article can be found here. The code here includes the training loop so you can run it end-to-end.

D-Bus

Introduction

D-Bus (Desktop Bus) is an inter-process communication (IPC) system used on Linux and other Unix-like systems. It allows different programs — even running as different users — to send messages and signals to each other without needing to know each other’s implementation details.

Main ideas

  • Message bus: A daemon (dbus-daemon) runs in the background and acts as a router for messages between applications.
  • Two main buses:
    • System bus – for communication between system services and user programs (e.g., NetworkManager, systemd, BlueZ).
    • Session bus – for communication between applications in a user’s desktop session (e.g., a file manager talking to a thumbnailer).
  • Communication model:
    • Method calls – like function calls between processes.
    • Signals – broadcast events (e.g., “Wi-Fi disconnected”).
    • Properties – read/write state values.
  • Naming:
    • Bus names – unique or well-known IDs for services (e.g., org.freedesktop.NetworkManager).
    • Object paths – hierarchical paths (e.g., /org/freedesktop/NetworkManager).
    • Interfaces – namespaces for methods/signals (e.g., org.freedesktop.NetworkManager.Device).

Here’s a visual representation of the architecture:

flowchart LR subgraph AppLayer[User Applications] A1[App 1] A2[App 2] end subgraph DBusDaemon[D-Bus Daemon Message Bus] D1[System Bus] D2[Session Bus] end subgraph SysServices[System Services] S1[NetworkManager] S2[BlueZ Bluetooth] S3[systemd-logind] end %% Connections A1 --method calls or signals--> D2 A2 --method calls or signals--> D2 S1 --method calls or signals--> D1 S2 --method calls or signals--> D1 S3 --method calls or signals--> D1 %% Cross communication D1 <-->|routes messages| A1 D1 <-->|routes messages| A2 D2 <-->|routes messages| A1 D2 <-->|routes messages| A2 %% System bus to service connections D1 <-->|routes messages| S1 D1 <-->|routes messages| S2 D1 <-->|routes messages| S3

User applications call methods or raise signals to a Session Bus inside the D-Bus Daemon. In turn, these messages are routed to System Services, with responses sent back to the applications via the bus.

D-Bus removes the need for each program to implement its own custom IPC protocol. It’s widely supported by desktop environments, system services, and embedded Linux stacks.

In this article, we’ll walk through some basic D-Bus usage, building up to a few practical use cases.

busctl

busctl lets you interact with D-Bus from the terminal. According to the man page:

busctl may be used to introspect and monitor the D-Bus bus.

We can start by listing all connected peers:

busctl list

This shows a list of service names for software and services currently on your system’s bus.

Devices

If you have NetworkManager running, you’ll see org.freedesktop.NetworkManager in the list.
You can query all available devices with:

busctl call org.freedesktop.NetworkManager /org/freedesktop/NetworkManager \
  org.freedesktop.NetworkManager GetDevices

Example output:

ao 6 "/org/freedesktop/NetworkManager/Devices/1" "/org/freedesktop/NetworkManager/Devices/2" "/org/freedesktop/NetworkManager/Devices/3" "/org/freedesktop/NetworkManager/Devices/4" "/org/freedesktop/NetworkManager/Devices/5" "/org/freedesktop/NetworkManager/Devices/6"
What is ao 6? At the start of the output, you'll see the data type. An array of object paths with 6 elements.

Those object paths aren’t very descriptive, so you can query one for its interface name:

busctl get-property org.freedesktop.NetworkManager \
  /org/freedesktop/NetworkManager/Devices/1 \
  org.freedesktop.NetworkManager.Device Interface

On my system:

s "lo"

The leading s tells us this is a string — here, the loopback adapter.

Introspect

You can list all properties, methods, and signals for a given object with:

busctl introspect org.freedesktop.NetworkManager \
  /org/freedesktop/NetworkManager/Devices/1

Or without the pager:

busctl --verbose --no-pager introspect org.freedesktop.NetworkManager \
  /org/freedesktop/NetworkManager/Devices/1

Desktop Notifications

Now that we can query D-Bus, we can also send messages.
For example, you could end a shell script with a visual notification on your desktop:

gdbus call --session \
  --dest org.freedesktop.Notifications \
  --object-path /org/freedesktop/Notifications \
  --method org.freedesktop.Notifications.Notify \
  "my-app" 0 "" "Build finished" "All tests passed" \
  '[]' '{"urgency": <byte 1>}' 5000

Tip: gdbus is part of the glib2 or glib2-tools package on many distributions.

This performs a method call on a D-Bus object.

  • --dest — The bus name (service) to talk to.
  • --object-path — The specific object inside that service.
  • --method — The method we want to invoke.

This method’s signature is s u s s s as a{sv} i, meaning:

Code Type Description Example Value Meaning
s string "my-app" Application name
u uint32 0 Notification ID (0 = new)
s string "" Icon name/path
s string "Build finished" Title
s string "All tests passed" Body text
as array of strings '[]' Action identifiers
a{sv} dict<string, variant> '{"urgency": <byte 1>}' Hints (0=low, 1=normal, 2=critical)
i int32 5000 Timeout (ms)

Monitoring

D-Bus also lets you watch messages as they pass through.
To monitor all system bus messages (root may be required):

busctl monitor --system

To filter for a specific destination:

busctl monitor org.freedesktop.NetworkManager

These commands stream events to your console in real time.

Conclusion

D-Bus is a quiet but powerful layer in modern Linux desktops and servers. Whether you’re inspecting running services, wiring up automation, or building new desktop features, learning to speak D-Bus gives you a direct line into the heart of the system. Once you’ve mastered a few core commands, the rest is just exploring available services and imagining what you can automate next.