Natural language processing (NLP) has gone through several paradigm shifts:
Bag-of-Words — treated text as unordered word counts; no sequence information. We’ve spoken about this previously.
Word Embeddings (word2vec, GloVe) — learned fixed-vector representations that captured meaning. We’ve looked at these previously.
RNNs, LSTMs, GRUs — processed sequences token-by-token, retaining a hidden state; struggled with long-range dependencies due to vanishing gradients.
Seq2Seq with Attention — attention helped the model “focus” on relevant input tokens; a leap in translation and summarization.
Transformers (Vaswani et al., 2017 — “Attention Is All You Need”) — replaced recurrence entirely with self-attention, allowing parallelization and longer context handling.
Transformers didn’t just improve accuracy; they unlocked the ability to scale models massively.
In this post, we’ll walk though an understanding of the transformer architecture by implementing a GPT-style
Transformer from scratch in PyTorch, from tokenization to text generation.
The goal: make the architecture concrete and understandable, not magical.
Overview
At a high level, our model will:
Tokenize text into integers.
Map tokens to dense embeddings + positional encodings.
Apply self-attention to mix contextual information.
Use feed-forward networks for per-token transformations.
Wrap attention + FFN in Transformer Blocks with residual connections and layer normalization.
Project back to vocabulary logits.
Generate text autoregressively.
graph TD
A[Text Input] --> B[Tokenizer]
B --> C[Token Embeddings + Positional Encoding]
C --> D[Transformer Block × N]
D --> E[Linear Projection to Vocabulary Size]
E --> F[Softmax Probabilities]
F --> G[Sample / Argmax Next Token]
G -->|Loop| C
Tokenization
Before our model can process text, we need to turn characters into numbers it can work with — a process called
tokenization. In this example, we use a simple byte-level tokenizer, which treats every UTF-8 byte as its own token.
This keeps the implementation minimal while still being able to represent any possible text without building a custom
vocabulary.
classByteTokenizer:"""
UTF-8 bytes <-> ints in [0..255].
NOTE: For production models you'd use a subword tokenizer (BPE, SentencePiece).
"""def__init__(self)->None:self.vocab_size=256defencode(self,text:str)->list[int]:returnlist(text.encode("utf-8"))defdecode(self,ids:list[int])->str:returnbytes(ids).decode("utf-8",errors="ignore")
Once we have token IDs, we map them into embedding vectors — learned dense representations that capture meaning in
a continuous space. Each token ID indexes a row in an embedding matrix, turning a discrete integer into a trainable
vector of size \(d_{\text{model}}\). Because self-attention alone has no sense of order, we also add
positional embeddings, giving the model information about each token’s position within the sequence.
That equation means each token computes a similarity score with all other tokens (via \(QK^\top\)), scales it
by \(\sqrt{d_k}\) to stabilize gradients, turns the scores into probabilities with softmax, and then uses those
probabilities to take a weighted sum of the value vectors \(V\) to produce its new representation.
Multi-head attention runs this in parallel on different projections.
Linear layer: expands to \(\text{mult} \times d_{\text{model}}\).
GELU activation: introduces non-linearity.
Linear layer: projects back to \(d_{\text{model}}\).
Dropout: randomly zeroes some activations during training for regularization.
Transformer Block
A Transformer block applies pre-layer normalization, then runs the data through either a multi-head self-attention
layer or a feed-forward network (FFN), and adds a residual connection after each. This structure is stacked multiple
times to deepen the model.
graph TD
A[Input] --> B[LayerNorm]
B --> C[Multi-Head Self-Attention]
C --> D[Residual Add]
D --> E[LayerNorm]
E --> F[Feed-Forward Network]
F --> G[Residual Add]
G --> H[Output to Next Block]
After token and position embeddings are summed, the data flows through a stack of Transformer blocks, each applying
self-attention and a feed-forward transformation with residual connections.
Once all blocks have run, we apply a final LayerNorm to normalize the hidden state vectors and keep training stable.
From there, each token’s hidden vector is projected back into vocabulary space — producing a vector of raw
scores (logits) for each possible token in the vocabulary.
We also use weight tying here: the projection matrix for mapping hidden vectors to logits is the same matrix as
the token embedding layer’s weights.
This reduces the number of parameters, ensures a consistent mapping between tokens and embeddings, and has been shown
to improve generalization.
Mathematically, weight tying can be expressed as:
\[\text{logits} = H \cdot E^\top\]
where \(H\) is the matrix of hidden states from the final Transformer layer, and \(E\) is the embedding matrix
from the input token embedding layer. This means the output projection reuses (shares) the same weights as the input
embedding, just transposed.
This method performs autoregressive text generation: we start with some initial tokens, repeatedly predict the
next token, append it, and feed the result back into the model.
Key concepts:
Autoregressive: generation proceeds one token at a time, conditioning on all tokens so far.
Temperature: scales the logits before softmax; values < 1.0 make predictions sharper/more confident, > 1.0 make them more random.
Top-k filtering: keeps only the k highest-probability tokens and sets all others to negative infinity before sampling, which limits randomness to plausible options.
Step-by-step in generate():
Crop context: keep only the last block_size tokens to match the model’s maximum context window.
Forward pass: get logits for each position in the sequence.
Select last step’s logits: we only want the prediction for the next token.
Adjust for temperature (optional).
Apply top-k filtering (optional).
Softmax: convert logits into a probability distribution.
Sample: randomly choose the next token according to the probabilities.
Append: add the new token to the sequence and repeat.
This loop continues until max_new_tokens tokens have been generated.
That concludes the entire stack that we need. We can start to ask questions of this very basic model. Just remember,
this is a tiny model so results are not going to be amazing, but it will give you a sense of how these tokens are
generated.
After training briefly on a small excerpt of Moby Dick plus a few Q/A lines, we can get:
Q: Why does he go to sea?
A: To drive off the spleen and regulate the circulation.
Even a tiny model learns local structure.
Conclusion
Even though this isn’t the perfect model that will challenge all of the big guys, I hope this has been a bit of a step
by step walkthough on how the transformer architecture is put together.
A full version of the code referenced in this article can be found here.
The code here includes the training loop so you can run it end-to-end.
D-Bus (Desktop Bus) is an inter-process communication (IPC) system
used on Linux and other Unix-like systems. It allows different programs — even running as different users — to send
messages and signals to each other without needing to know each other’s implementation details.
Main ideas
Message bus: A daemon (dbus-daemon) runs in the background and acts as a router for messages between applications.
Two main buses:
System bus – for communication between system services and user programs (e.g., NetworkManager, systemd, BlueZ).
Session bus – for communication between applications in a user’s desktop session (e.g., a file manager talking to a thumbnailer).
Communication model:
Method calls – like function calls between processes.
Interfaces – namespaces for methods/signals (e.g., org.freedesktop.NetworkManager.Device).
Here’s a visual representation of the architecture:
flowchart LR
subgraph AppLayer[User Applications]
A1[App 1]
A2[App 2]
end
subgraph DBusDaemon[D-Bus Daemon Message Bus]
D1[System Bus]
D2[Session Bus]
end
subgraph SysServices[System Services]
S1[NetworkManager]
S2[BlueZ Bluetooth]
S3[systemd-logind]
end
%% Connections
A1 --method calls or signals--> D2
A2 --method calls or signals--> D2
S1 --method calls or signals--> D1
S2 --method calls or signals--> D1
S3 --method calls or signals--> D1
%% Cross communication
D1 <-->|routes messages| A1
D1 <-->|routes messages| A2
D2 <-->|routes messages| A1
D2 <-->|routes messages| A2
%% System bus to service connections
D1 <-->|routes messages| S1
D1 <-->|routes messages| S2
D1 <-->|routes messages| S3
User applications call methods or raise signals to a Session Bus inside the D-Bus Daemon. In turn,
these messages are routed to System Services, with responses sent back to the applications via the bus.
D-Bus removes the need for each program to implement its own custom IPC protocol. It’s widely supported by desktop
environments, system services, and embedded Linux stacks.
In this article, we’ll walk through some basic D-Bus usage, building up to a few practical use cases.
busctl
busctl lets you interact with D-Bus from the terminal. According to the man page:
busctl may be used to introspect and monitor the D-Bus bus.
We can start by listing all connected peers:
busctl list
This shows a list of service names for software and services currently on your system’s bus.
Devices
If you have NetworkManager running, you’ll see org.freedesktop.NetworkManager in the list.
You can query all available devices with:
Tip:gdbus is part of the glib2 or glib2-tools package on many distributions.
This performs a method call on a D-Bus object.
--dest — The bus name (service) to talk to.
--object-path — The specific object inside that service.
--method — The method we want to invoke.
This method’s signature is s u s s s as a{sv} i, meaning:
Code
Type Description
Example Value
Meaning
s
string
"my-app"
Application name
u
uint32
0
Notification ID (0 = new)
s
string
""
Icon name/path
s
string
"Build finished"
Title
s
string
"All tests passed"
Body text
as
array of strings
'[]'
Action identifiers
a{sv}
dict<string, variant>
'{"urgency": <byte 1>}'
Hints (0=low, 1=normal, 2=critical)
i
int32
5000
Timeout (ms)
Monitoring
D-Bus also lets you watch messages as they pass through.
To monitor all system bus messages (root may be required):
busctl monitor --system
To filter for a specific destination:
busctl monitor org.freedesktop.NetworkManager
These commands stream events to your console in real time.
Conclusion
D-Bus is a quiet but powerful layer in modern Linux desktops and servers. Whether you’re inspecting running services,
wiring up automation, or building new desktop features, learning to speak D-Bus gives you a direct line into the heart
of the system. Once you’ve mastered a few core commands, the rest is just exploring available services and
imagining what you can automate next.
TL;DR: std::move doesn’t move anything by itself. It’s a cast that permits moving. Real moves happen in your
type’s move constructor/assignment. Use them to trade deep copies for cheap pointer swaps and to unlock container
performance—provided you mark them noexcept.
The motivating example
We’ll anchor everything on a tiny heap-owning type. It’s intentionally “unsafe” (raw new[]/delete[]) so the
ownership transfer is easy to see in logs.
#include<iostream>
#include<utility> // for std::movestructmy_object{int*data;size_tsize;// Constructormy_object(size_tn):data(newint[n]),size(n){std::cout<<"Constructed ("<<this<<") size="<<size<<" data="<<data<<"\n";}// Copy constructormy_object(constmy_object&other):data(newint[other.size]),size(other.size){std::copy(other.data,other.data+size,data);std::cout<<"Copied from ("<<&other<<") to ("<<this<<")"<<" data="<<data<<"\n";}// Move constructormy_object(my_object&&other)noexcept:data(other.data),size(other.size){other.data=nullptr;other.size=0;std::cout<<"Moved from ("<<&other<<") to ("<<this<<")"<<" data="<<data<<"\n";}// Destructor~my_object(){std::cout<<"Destroying ("<<this<<") data="<<data<<"\n";delete[]data;}};intmain(){std::cout<<"--- Create obj1 ---\n";my_objectobj1(5);std::cout<<"\n--- Copy obj1 into obj2 ---\n";my_objectobj2=obj1;// Calls copy constructorstd::cout<<"\n--- Move obj1 into obj3 ---\n";my_objectobj3=std::move(obj1);// Calls move constructorstd::cout<<"\n--- End of main ---\n";}
When you run this you’ll see:
One deep allocation
One deep copy (new buffer), and
One move (no allocation; just pointer steal).
The destructor logs reveal that ownership was transferred and that the moved-from object was neutered.
Having a brief look at the output (from my machine, at least):
--- Create obj1 ---
Constructed (0x7ffd8c960858) size=5 data=0x5616824336c0
--- Copy obj1 into obj2 ---
Copied from (0x7ffd8c960858) to (0x7ffd8c960848) data=0x5616824336e0
--- Move obj1 into obj3 ---
Moved from (0x7ffd8c960858) to (0x7ffd8c960838) data=0x5616824336c0
--- End of main ---
Destroying (0x7ffd8c960838) data=0x5616824336c0
Destroying (0x7ffd8c960848) data=0x5616824336e0
Destroying (0x7ffd8c960858) data=0
Constructed: obj1 allocates a buffer at 0x5616824336c0.
Copied: obj2 gets its own buffer (0x5616824336e0) and the contents are duplicated from obj1.
At this point, both obj1 and obj2 own separate allocations.
Moved: obj3 simply takes ownership of obj1’s buffer (0x5616824336c0) without allocating.
obj1’s data pointer is nulled out (data=0), leaving it valid but empty.
Destruction order: obj3 frees obj1’s original buffer, obj2 frees its own copy, and finally obj1 frees nothing (because it’s been neutered by the move).
The contrasting addresses make it easy to see:
Copies produce different data pointers.
Moves result in pointer reuse.
What problem do move semantics solve?
Before C++11, passing/returning big objects often meant deep copies or awkward workarounds. Containers like
std::vector<T> also had a problem: on reallocation they could only copy elements. If copying T was expensive or
forbidden, performance cratered.
Move semantics (C++11) let a type say: “If you no longer need the source object, I can steal its resources
instead of allocating/copying them.” This unlocks:
Returning large objects by value efficiently.
Growing containers without copying payloads.
Expressing one-time ownership transfers cleanly.
Conclusion
In this small example we only wrote a move constructor, but real-world resource-owning classes often need both move
and copy operations, plus move assignment. The full “rule of five” ensures your type behaves correctly in all
situations — and marking moves noexcept can make a big difference in container performance.
Move semantics solves a big problem especially when your class encapsulates a lot of data. It’s an elegant solution that
C++ provides you for performance, ownership, and safety.
I’ve always liked the idea that a programming language can feel like a musical instrument.
Last night, I decided to make that idea very literal.
The result is rack — a little Clojure module that models a modular synthesizer. It doesn’t aim to be a complete
DAW or polished softsynth — this is more of an experiment: what if we could patch together oscillators and filters
the way Eurorack folks do, but using s-expressions instead of patch cables?
Clojure’s s-expressions are perfect for this kind of modeling.
A synth module is, in some sense, just a little bundle of state and behavior. In OOP we might wrap that up in a class;
in Clojure, we can capture it as a simple map, plus a few functions that know how to work with it.
The parentheses give us a “patch cable” feel — data and functions connected in readable chains.
A 30-Second Synth Primer
Before we dive into code, a very quick crash course in some synth lingo:
VCO (Voltage-Controlled Oscillator): Produces a periodic waveform — the basic sound source.
LFO (Low-Frequency Oscillator): Like a VCO, but slower, used for modulation (wobble, vibrato, etc.).
VCA (Voltage-Controlled Amplifier): Controls the amplitude of a signal, usually over time.
That’s enough to make the examples readable. We’re here for the Clojure, not the audio theory.
Setup Audio
The first thing we need to do is open an audio output line.
Java’s javax.sound.sampled API is low-level but accessible from Clojure with no extra dependencies.
Three constants — a 48 kHz sample rate (good quality, not too CPU-heavy), 16-bit samples, and mono output.
Starting the Audio Engine
(defn^SourceDataLineopen-line([](open-linesample-rate))([sr](let[fmt(AudioFormat.(floatsr)bits-per-samplechannelstruefalse); signed, little endian^SourceDataLineline(AudioSystem/getSourceDataLinefmt)](.openlinefmt4096);; important: use (fmt, bufferSize) overload(.startline)line)))
Line by line:
Function arity: With no arguments, open-line uses our default sample rate. With one argument, you can pass a custom rate.
AudioFormat.: Creates a format object with:
sr as a float
bits-per-sample bits per sample
channels (mono)
true for signed samples
false for little-endian byte order
AudioSystem/getSourceDataLine: Asks the JVM for a line that matches our format.
.open: Opens the line with a buffer size of 4096 bytes — small enough for low latency, large enough to avoid dropouts.
.start: Starts audio playback.
Returns the SourceDataLine object so we can write samples to it.
This helper prints out all available audio devices (“mixers”) so you can choose one if your machine has multiple outputs.
In cases where you’re struggling to find the appropriate sound mixer, this function can help you diagnose these problems.
Starting, Stopping, and Writing Audio
Opening an audio line is one thing — actually feeding it samples in real time is another.
This is where we start talking about frames, buffers, and a little bit of number crunching.
buf: A float array of audio samples, each in the range \([-1.0, 1.0]\).
nframes: How many samples we want to send.
Output:
out: A byte array holding the samples in 16-bit little-endian PCM format.
Scaling floats to integers
Most audio hardware expects integers, not floats. In 16-bit PCM, the range is \([-32768, 32767]\).
We scale a float \(x\) by \(32767.0\):
\[s = \operatorname{round}(x \times 32767)\]
For example:
\[x = 1.0 \Rightarrow s = 32767\]
\[x = -1.0 \Rightarrow s = -32767\]
(close enough; the exact min value is special-cased in PCM)
Breaking into bytes
Because 16 bits = 2 bytes, we split the integer into:
Low byte: \(s \,\&\, 0xFF\)
High byte: \((s \gg 8) \,\&\, 0xFF\)
We store them in little-endian order — low byte first — so the audio hardware interprets them correctly.
graph LR
A[Modules] --> B[Mix Function]
B --> C[Float Buffer]
C --> D[write-frames!]
D --> E[16-bit PCM Bytes]
E --> F[Audio Line]
F --> G[Speakers / Headphones]
Stopping audio isn’t just hitting a “pause” button:
:running? tells the audio thread to exit its loop.
.join waits briefly for that thread to finish.
.drain ensures any remaining samples in the buffer are played before stopping.
.stop and .close free the hardware resources.
Starting audio in real time
(defnstart-audio!"Start real-time audio. Call stop-audio! to end."([](start-audio!sample-rate1024))([srblock-size](stop-audio!)(ensure-main-mixer!)(let[^SourceDataLineline(open-linesr)runner(doto(Thread.(fn[](try(let[ctx(make-ctxsr)](while(:running?@engine)(let[cache(atom{})mix((:pullctx)cachectx"main-mixer":outblock-size)](write-frames!linemixblock-size))))(catchThrowablee(.printStackTracee))(finally(try(.drainline)(.stopline)(.closeline)(catchThrowable_))))))(.setDaemontrue))](reset!engine{:running?true:threadrunner:lineline})(.startrunner):ok)))
This is where the magic loop happens:
block-size is how many frames we process at a time — small enough for low latency, large enough to avoid CPU overload.
We open the line, then spin up a daemon thread so it won’t block JVM shutdown.
Inside the loop:
make-ctx builds a context with our sample rate.
(:pull ctx) asks the “main mixer” module for the next block-size frames.
We hand those frames to write-frames! to push them to the audio hardware.
When :running? goes false, the loop exits, drains the buffer, and closes the line.
How block size relates to latency
Audio latency is fundamentally the time between “we computed samples” and “we hear them.” For a block-based engine,
one irreducible component is the block latency:
Lowering block-size reduces compute-to-play latency but increases CPU overhead (more wakeups, more function calls).
Similarly, if your device/driver allows a smaller .open buffer, you can shave additional milliseconds — at the risk
of underruns (clicks/pops). The sweet spot depends on your machine.
Keeping Track of your Patch
A modular synth is basically:
A set of modules (oscillators, filters, VCAs…)
A set of connections between module outputs and inputs
Some engine state for playback
We’ll keep these in atoms so we can mutate them interactively in the REPL.
(defonce^:privateregistry(atom{})); id -> module(defonce^:privatecables(atom#{})); set of {:from [id port] :to [id port] :gain g}(defonce^:privateengine(atom{:running?false:threadnil:linenil}))
registry: All modules in the patch, keyed by ID.
cables: All connections, each with from/to module IDs and ports, plus an optional gain.
engine: Tracks whether audio is running, plus the playback thread and output line.
This wipes everything so you can start a new patch. No modules. No cables.
Adding modules and cables
(defn-registerm)(:idm))(defnadd-cable"Connect module output → input. Optional gain (defaults 1.0)."([from-idfrom-portto-idto-port](add-cablefrom-idfrom-portto-idto-port1.0))([from-idfrom-portto-idto-portgain](swap!cablesconj{:from[from-id(keywordfrom-port)]:to[to-id(keywordto-port)]:gain(doublegain)}):ok))
register! stores a module in the registry and returns its ID. add-cable creates a connection between two module ports — think of it as digitally plugging in a patch cable.
These functions are basic data structure management.
Setting parameters generically
(defnset-param!"Set a module parameter (e.g., (set-param! \"vco1\" :freq 440.0))."[idkv](when-let[st(:state(@registryid))](swap!stassockv)):ok)
Because each module stores its state in a map, we can update parameters without knowing the module’s internals. This is
one of the joys of modeling in Clojure — generic operations fall out naturally.
Pulling Signal
Up to now we can open the device, stream audio, and keep track of a patch.
But how do modules actually produce samples for each block?
We use a pull-based model: when the engine needs N frames from a module’s output port, it asks that module to
render. If the module depends on other modules (its inputs), it pulls those first, mixes/filters them, and returns a
buffer.
This naturally walks the patch graph from outputs back to sources and avoids doing work we don’t need.
We scan @cables for any connection whose :to is exactly [to-id to-port].
The result is a (possibly empty) sequence of “incoming patch cables”.
This is intentionally tiny; the interesting part comes when we combine the sources.
Summing signals into a buffer
(defn-sum-into"Sum all signals connected to [id port] into a float-array of nframes."[cachectxidportnframes](let[conns(connections-intoidport)](if(seqconns)(let[acc(float-arraynframes)](doseq[{:keys[fromgain]}conns:let[[src-idsrc-port]frombuf((:pullctx)cachectxsrc-idsrc-portnframes)g(floatgain)]](dotimes[inframes](aset-floatacci(+(agetacci)(*g(aget^floatsbufi))))))acc)(float-arraynframes))))
Conceptually, if \(\{x_k[i]\}\) are the input buffers (per-connection) and \(g_k\) are the per-cable gains, the
mixed signal is:
\[y[i] \;=\; \sum_{k=1}^{K} g_k \, x_k[i], \quad i = 0,1,\dots,n\!-\!1\]
Where:
\(n =\) nframes (the block size),
\(K =\) number of incoming connections into [id port].
Implementation notes:
We allocate acc as our accumulator buffer and initialize it to zeros.
For each incoming connection:
We pull from the source (src-id, src-port) via (:pull ctx).
We convert the cable’s gain to a float once (keeps the inner loop tight).
We add the scaled samples into acc.
If there are no connections, we return a zeroed buffer (silence). This is a convenient “ground” for the graph.
Time complexity for this step is \(O(K \cdot n)\) per port, which is exactly what you’d expect for mixing \(K\) streams.
Rendering a port with per-block memoization
(defn-render-port"Render [id port] with memoization for this audio block."[cachectxidportnframes](if-let[cached(get@cache[idport])]cached(let[m(@registryid)](when-notm(throw(ex-info(str"Unknown module: "id){})))(let[outbuf((:processm)ctxm(keywordport)nframes)](swap!cacheassoc[idport]outbuf)outbuf))))
Why memoize? Consider one VCO feeding two different modules, both ultimately ending at your main mixer. In a naive pull
model, the VCO would be recomputed twice per block. We avoid that by caching the result buffer for [id port] the
first time it’s pulled in a block:
cache is an atom (a per-block memo table).
If we’ve already computed [id port], return the cached buffer.
Otherwise, we call the module’s :process function, stash the buffer, and return it.
This makes the pull model efficient even when the patch graph has lots of fan-out.
The context object (ctx)
;; ctx provides a way for modules to pull inputs(defn-make-ctx[sr]{:srsr:pull(fn[cachectxidportnframes](render-portcachectxidportnframes))})
ctx bundles:
:sr — the sample rate (modules often need it for phase increments, envelopes, etc.).
:pull — the function modules call to obtain inputs. This keeps module code simple and testable.
Because :pull closes over render-port, modules don’t need to know about caching details or registry lookups —
they just ask the world for “the buffer at [id port]”.
flowchart TD
subgraph Block["nframes block render"]
MM["Main Mixer :process"] -->|:pull osc1 :out| VCO
MM -->|:pull lfo1 :out| LFO
VCO -->|:pull mod :in| SUM
LFO -->|:pull mod :in| SUM
SUM["sum-into"] -->|float buf| MM
end
style SUM fill:#eef,stroke:#77a
style MM fill:#efe,stroke:#7a7
style VCO fill:#fee,stroke:#a77
style LFO fill:#fee,stroke:#a77
Cache["Per-block cache"] --- MM
Cache --- VCO
Cache --- LFO
The cache sits beside the graph for the duration of a single block render. Any subsequent pulls of the
same [id port] return the memoized buffer.
Numerical notes (clipping and headroom)
Mixing is a straight sum. If your sources are near full-scale and you add them, you can exceed ([-1, 1]) in the
mixed float domain, which will later clip when we convert to 16-bit in write-frames!. Options to consider (later):
Normalize or soft-clip in the mixer: ( y[i] \leftarrow \tanh(y[i]) ) or a gentle limiter.
Encourage sub-unity gain on cables feeding into mixers.
Keep VCO defaults conservative (e.g., amplitude (0.2) or (0.5)) to preserve headroom.
Modules
With all of the setup finished, we can finally create some modules — the building blocks of a patch.
The module shape: mk-* vs. public constructor
Each module comes in two layers:
A maker (mk-vco, mk-lfo, mk-vca, …): returns a plain Clojure map that describes the module:
:id, :type, and a mutable :state atom
:inputs / :outputs port sets
a :process function with the signature (:process m) ctx m requested-port nframes -> float-array
A public constructor (vco, lfo, vca, …): a thin wrapper that calls the maker and then register!s the resulting module into the global registry. This pattern keeps the module definition pure/data-first and the side‑effect (registration) explicit.
The engine always drives modules through :process. If a module needs other signals, it pulls them
via sum-into (which uses the per‑block cache and respects cabling).
Voltage Controlled Oscillator (VCO)
A VCO produces periodic waveforms at audio rates. In this design:
Base frequency is :freq (Hz).
A control input :pitch (typically from an LFO or envelope) modulates the frequency by :pitch-depth (Hz per unit CV).
We render four classic shapes from the same phase accumulator: sine, square, saw, and reverse‑saw, each scaled by :amp.
Note on outputs: this VCO exposes :sine-out, :square-out, :saw-out, and :rev-saw-out. When
cabling, target one of those (e.g., :sine-out), not :out.
(defn-mk-vco[id{:keys[freqamppitch-depth]:or{freq220.0amp0.2pitch-depth50.0}}](let[state(atom{:phase0.0:freq(doublefreq):amp(doubleamp):pitch-depth(doublepitch-depth)})]{:idid:type:vco:statestate:outputs#{:sine-out:square-out:saw-out:rev-saw-out}:inputs#{:pitch}:process(fn[ctxmportnframes](let[{:keys[phasefreqamppitch-depth]}@(:statem)sr(:srctx)pitch-buf(sum-into(atom{})ctx(:idm):pitchnframes)two-pi(*2.0Math/PI);; output bufferssine-buf(float-arraynframes)square-buf(float-arraynframes)saw-buf(float-arraynframes)rev-saw-buf(float-arraynframes)];; run the block, capture final phase(let[final-ph(loop[i0,phphase](if(<inframes)(let[hz(max0.0(+freq(*pitch-depth(aget^floatspitch-bufi))))ph2(let[p(+ph(/(*two-pihz)sr))](if(>=ptwo-pi)(-ptwo-pi)p))norm-phase(/phtwo-pi); 0..1 based on current phasesine(Math/sinph)square(if(<phMath/PI)1.0-1.0)saw(-(*2.0norm-phase)1.0)rev-saw(-1.0(*2.0norm-phase))](aset-floatsine-bufi(float(*ampsine)))(aset-floatsquare-bufi(float(*ampsquare)))(aset-floatsaw-bufi(float(*ampsaw)))(aset-floatrev-saw-bufi(float(*amprev-saw)))(recur(inci)ph2))ph))];; persist the advanced phase(swap!(:statem)assoc:phase(doublefinal-ph)));; return the requested port(caseport:sine-outsine-buf:square-outsquare-buf:saw-outsaw-buf:rev-saw-outrev-saw-buf(float-arraynframes))))}))(defnvco"Create and register a Voltage Controlled Oscillator (VCO) module.
The VCO generates multiple waveforms and supports pitch modulation via the :pitch input.
Inputs:
:pitch — control signal in [-1.0 .. +1.0] range, multiplied by :pitch-depth (Hz)
and added to :freq.
Outputs:
:sine-out, :square-out, :saw-out, :rev-saw-out
Parameters:
:freq — base frequency in Hz (default = 220.0).
:amp — peak amplitude (default = 0.2).
:pitch-depth — Hz per unit of :pitch CV (default = 50.0).
Example:
(vco \"osc1\" {:freq 440.0 :amp 0.25 :pitch-depth 20.0})
(lfo \"mod1\" {:freq 5.0 :amp 1.0})
(add-cable \"mod1\" \"sine-out\" \"osc1\" \"pitch\")
(add-cable \"osc1\" \"sine-out\" \"main-mixer\" \"in\")"([id](vcoid{}))([idparams](register!(mk-vcoidparams))))
Low Frequency Oscillator (LFO)
An LFO is just an oscillator that runs at control rates (typically < 20 Hz). We use it to modulate other
parameters (pitch, amplitude, filter cutoff…). The math is identical to the VCO’s phase increment, just at a lower
:freq, and the output is usually not sent directly to the speakers.
(defn-mk-lfo[id{:keys[freqamp]:or{freq2.0amp1.0}}](let[state(atom{:phase0.0:freq(doublefreq):amp(doubleamp)})]{:idid:type:lfo:statestate:outputs#{:sine-out}:inputs#{}:process(fn[ctxmportnframes](let[{:keys[phasefreqamp]}@(:statem)sr(:srctx)out(float-arraynframes)two-pi(*2.0Math/PI)](let[final-ph(loop[i0,phphase](if(<inframes)(let[ph2(let[p(+ph(/(*two-pifreq)sr))](if(>=ptwo-pi)(-ptwo-pi)p))s(*amp(Math/sinph))](aset-floatouti(floats))(recur(inci)ph2))ph))](swap!(:statem)assoc:phase(doublefinal-ph)))out))}))(defnlfo"Create and register a Low Frequency Oscillator (LFO) module.
Outputs:
:sine-out — control-rate sine in [-amp .. +amp].
Parameters:
:freq — Hz (default = 2.0).
:amp — peak amplitude (default = 1.0).
Example:
(lfo \"mod1\" {:freq 5.0 :amp 1.0})
(vco \"osc1\" {:freq 220.0 :amp 0.2})
(add-cable \"mod1\" \"sine-out\" \"osc1\" \"pitch\")"([id](lfoid{}))([idparams](register!(mk-lfoidparams))))
Voltage Controlled Amplifier (VCA)
A VCA scales an audio signal by a gain derived from a control voltage (CV). A common musical use is tremolo:
feed a VCO into :in, an LFO into :cv, and you’ll hear periodic amplitude variation.
We map CV \(\in [-1,1]\) to gain \(\in [0,1]\) (plus an optional :bias) using:
\([
\text{gain}_i = \operatorname{clamp}_{[0,1]}\!\left(\text{bias} + \tfrac{1}{2}(\text{cv}[i] + 1)\right).
]\)
The output sample is \(y[i] = \text{gain}_i \cdot x[i]\).
(defn-mk-vca[id{:keys[bias]:or{bias0.0}}](let[state(atom{:bias(doublebias)})]{:idid:type:vca:statestate:inputs#{:in:cv};; audio in, control voltage in [-1..1]:outputs#{:out}:process(fn[ctxmportnframes](let[in(sum-into(atom{})ctx(:idm):innframes)cv(sum-into(atom{})ctx(:idm):cvnframes)out(float-arraynframes)bias(:bias@(:statem))](dotimes[inframes];; gain = max(0, bias + 0.5*(cv+1)) -> maps cv [-1..1] to [0..1](let[gain(max0.0(min1.0(+bias(*0.5(+1.0(aget^floatscvi))))))s(*gain(aget^floatsini))](aset-floatouti(floats))))out))}))(defnvca"Create and register a Voltage Controlled Amplifier (VCA) module.
Inputs:
:in — audio signal (float samples in [-1.0..1.0]).
:cv — control voltage signal in [-1.0..+1.0].
Output:
:out — amplified audio.
Parameter:
:bias — DC offset added before clamping gain to [0..1] (default 0.0).
Example (tremolo):
(vco \"osc\" {:freq 220 :amp 0.25})
(lfo \"mod\" {:freq 5.0 :amp 1.0})
(vca \"amp1\" {:bias 0.5})
(add-cable \"osc\" \"sine-out\" \"amp1\" \"in\")
(add-cable \"mod\" \"sine-out\" \"amp1\" \"cv\")
(add-cable \"amp1\" \"out\" \"main-mixer\" \"in\")"([id](vcaid{}))([idparams](register!(mk-vcaidparams))))
With these three modules you can already explore a surprising amount of sonic territory, and the pattern for adding
more is clear: define a small :state, specify ports, and implement :process that uses sum-into for inputs and
writes a block-sized buffer for outputs.
In our previous post, we got a basic “blinky” app running
on the Arm Cortex-M33 side of the RP2350 using Embassy and embassy-rp. This time, we’re reworking the same
application to target the RP2350’s RISC-V core instead—highlighting how to boot the
RISC-V Hazard 3 with Rust and control peripherals
using the rp-hal ecosystem.
This post walks through the key differences and required changes to adapt the project.
Most of this code is available in the examples section of the rp-hal repository.
What is RISC-V?
RISC-V (pronounced “risk-five”) is an open standard instruction set architecture (ISA) that
emerged from the University of California, Berkeley in 2010. Unlike proprietary ISAs such as x86 or Arm, RISC-V is
open and extensible—allowing anyone to design, implement, and manufacture RISC-V chips without licensing fees.
This openness has led to rapid adoption across academia, startups, and even large chipmakers. RISC-V cores can now be
found in everything from tiny embedded microcontrollers to Linux-capable SoCs and even experimental high-performance CPUs.
In the RP2350, RISC-V comes in the form of the Hazard3 core—a lightweight, open-source 3-stage RV32IMAC processor
developed by Raspberry Pi. It sits alongside the more familiar Arm Cortex-M33, making the RP2350 one of the first
widely accessible dual-ISA microcontrollers.
For embedded developers used to the Arm world, RISC-V introduces a slightly different toolchain and runtime, but the
basic concepts—GPIO control, clock configuration, memory mapping—remain very familiar.
In this post, we explore how to bring up a basic RISC-V application targeting the RP2350 Hazard3 core using Rust.
Switching to RISC-V: Overview
The RP2350’s second core is a Hazard3 RISC-V processor. To target it:
We switch toolchains from thumbv8m.main-none-eabihf to riscv32imac-unknown-none-elf
We drop the Embassy stack and use the rp235x-hal directly
We write or reuse suitable linker scripts and memory definitions
We adjust runtime startup, including clock and GPIO initialization
.cargo/config.toml Changes
We swap the build target and customize linker flags:
Note how we invert the typical linker script behavior: rp235x_riscv.x now includes link.x instead of the other way
around.
The Rust target riscv32imac-unknown-none-elf tells the compiler to generate code for a 32-bit RISC-V architecture
(riscv32) that supports the I (integer), M (multiply/divide), A (atomic), and C (compressed) instruction set
extensions.
The unknown-none-elf part indicates a bare-metal environment with no OS (none) and output in the standard ELF binary
format. This target is a common choice for embedded RISC-V development.
We swapped in a dedicated rp235x_riscv.x linker script to reflect RISC-V memory layout. This script takes care of
startup alignment, section placement, and stack/heap boundaries.
The build.rs file was also extended to emit both memory.x and rp235x_riscv.x so that tooling remains consistent
across platforms.
Observations and Gotchas
Clock setup is still necessary, even though the RISC-V HAL avoids some of the abstractions of Embassy.
Runtime and exception handling differ between Arm and RISC-V: for example, default handlers like DefaultInterruptHandler and DefaultExceptionHandler must be provided.
The boot block and .bi_entries sections are still necessary for picotool metadata.
Conclusion
Today’s article was only a brief follow up on the first article. All of these changes are available in a risc-v branch
that I’ve added to the original repository.