Cogs and Levers A blog full of technical stuff

The Magic of Diffie Hellman

Introduction

Imagine two people, Alice and Bob. They’re standing in a crowded room — everyone can hear them. Yet somehow, they want to agree on a secret password that only they know.

Sounds impossible, right?

That’s where Diffie–Hellman key exchange comes in. It’s a bit of mathematical magic that lets two people agree on a shared secret — even while everyone is listening.

Let’s walk through how it works — and then build a toy version in code to see it with your own eyes.

Mixing Paint

Let’s forget numbers for a second. Imagine this:

  1. Alice and Bob agree on a public color — let’s say yellow paint.
  2. Alice secretly picks red, and Bob secretly picks blue.
  3. They mix their secret color with the yellow:
    • Alice sends Bob the result of red + yellow.
    • Bob sends Alice the result of blue + yellow.
  4. Now each of them adds their secret color again:
    • Alice adds red to Bob’s mix: (yellow + blue) + red
    • Bob adds blue to Alice’s mix: (yellow + red) + blue

Both end up with the same final color: yellow + red + blue!

But someone watching only saw:

  • The public yellow
  • The mixes: (yellow + red), (yellow + blue)

They can’t reverse it to figure out the red or blue.

Mixing paint is easy, but un-mixing it is really hard.

From Paint to Numbers

In the real world, computers don’t mix colors — they work with math.

Specifically, Diffie–Hellman uses something called modular arithmetic. Module arithmetic is just math where we “wrap around” at some number.

For example:

\[7 \mod 5 = 2\]

We’ll also use exponentiation — raising a number to a power.

And here’s the core of the trick: it’s easy to compute this:

\[\text{result} = g^{\text{secret}} \mod p\]

But it’s hard to go backward and find the secret, even if you know result, g, and p.

This is the secret sauce behind Diffie–Hellman.

A Toy Implementation

Let’s see this story in action.

import random

# Publicly known numbers
p = 23      # A small prime number
g = 5       # A primitive root modulo p (more on this later)

print("Public values:  p =", p, ", g =", g)

# Alice picks a private number
a = random.randint(1, p-2)
A = pow(g, a, p)   # A = g^a mod p

# Bob picks a private number
b = random.randint(1, p-2)
B = pow(g, b, p)   # B = g^b mod p

print("Alice sends:", A)
print("Bob sends:  ", B)

# Each computes the shared secret
shared_secret_alice = pow(B, a, p)   # B^a mod p
shared_secret_bob = pow(A, b, p)     # A^b mod p

print("Alice computes shared secret:", shared_secret_alice)
print("Bob computes shared secret:  ", shared_secret_bob)

Running this (your results may vary due to random number selection), you’ll see something like this:

Public values:  p = 23 , g = 5
Alice sends: 10
Bob sends:   2
Alice computes shared secret: 8
Bob computes shared secret:   8

The important part here is that Alice and Bob both end up with the same shared secret.

Let’s breakdown this code, line by line.

p = 23
g = 5

These are public constants. Going back to the paint analogy, you can think of p as the size of the palette and g as our base “colour”. We are ok with these being known to anybody.

a = random.randint(1, p-2)
A = pow(g, a, p)

Alice chooses a secret nunber a, and then computes \(A = g^a \mod p\). This is her public key - the equivalent of “red + yellow”.

Bob does the same with his secret B, producing B.

shared_secret_alice = pow(B, a, p)
shared_secret_bob = pow(A, b, p)

They both raise the other’s public key to their secret power. And because of how exponentiation works, both arrive at the same final value:

\[(g^b)^a \mod p = (g^a)^b \mod p\]

This simplifies to:

\[g^{ab} \mod p\]

This is the shared secret.

Try it yourself

Try running the toy code above multiple times. You’ll see that:

  • Every time, Alice and Bob pick new private numbers.
  • They still always agree on the same final shared secret.

And yet… if someone was eavesdropping, they’d only see p, g, A, and B. That’s not enough to figure out a, b, or the final shared secret (unless they can solve a very hard math problem called the discrete logarithm problem — something computers can’t do quickly, even today).

It’s not perfect

Diffie–Hellman is powerful, but there’s a catch: it doesn’t authenticate the participants.

If a hacker, Mallory, can intercept the messages, she could do this:

  • Pretend to be Bob when talking to Alice
  • Pretend to be Alice when talking to Bob

Now she has two separate shared secrets — one with each person — and can man-in-the-middle the whole conversation.

So in practice, Diffie–Hellman is used with authentication — like digital certificates or signed messages — to prevent this attack.

So, the sorts of applications you’ll see this used in are:

  • TLS / HTTPS (the “S” in secure websites)
  • VPNs
  • Secure messaging (like Signal)
  • SSH key exchanges

It’s one of the fundamental building blocks of internet security.

Conclusion

Diffie–Hellman feels like a magic trick: two people agree on a secret, in public, without ever saying the secret out loud.

It’s one of the most beautiful algorithms in cryptography — simple, powerful, and still rock-solid almost 50 years after it was invented.

And now, you’ve built one yourself.

Fuzz testing C Binaries on Linux

Introduction

Fuzz testing is the art of breaking your software on purpose. By feeding random or malformed input into a program, we can uncover crashes, logic errors, or even security vulnerabilities — all without writing specific test cases.

In memory-unsafe languages like C, fuzzing is especially powerful. In just a few lines of shell script, we can hammer a binary until it falls over.

This guide shows how to fuzz a tiny C program using just cat /dev/urandom, and how to track down and analyze the crash with gdb.

The Target

First off we need our test candidate. By design this program is vulnerable through its use of strcpy.

#include <stdio.h>
#include <string.h>

void vulnerable(char *input) {
    char buffer[64];
    strcpy(buffer, input);  // Deliberately unsafe
}

int main() {
    char input[1024];
    fread(input, 1, sizeof(input), stdin);
    vulnerable(input);
    return 0;
}

In main, we’re reading up to 1kb of data from stdin. This pointer is then sent into the vulnerable function. A buffer is defined in there well under the 1kb that could come through the front door.

strcpy doesn’t care though. It’ll try and grab as much data until it encounters a null terminator.

This is our problem.

Let’s get this program built with some debugging information:

gcc -g -o vuln vuln.c

Basic “Dumb” Fuzzer

We have plenty of tools at our disposal, directly at the linux console. So we can put together a fuzz tester albeit simple, without any extra tools here.

Here’s fuzzer.sh:

# allow core dumps
ulimit -c unlimited

# send in some random data
cat /dev/urandom | head -c 100 | ./vuln

100 bytes should be enough to trigger some problems internally.

Running the fuzzer, we should see something similar to this:

*** stack smashing detected ***: terminated
[1]    4773 broken pipe                    cat /dev/urandom |
4774 done                                  head -c 100 |
4775 IOT instruction (core dumped)         ./vuln

We get some immediate feedback in stack smashing detected.

Where’s the Core Dump?

On modern Linux systems, core dumps don’t always appear in your working directory. Instead, they may be captured by systemd-coredump and stored elsewhere.

In order to get a list of core dumps, you can use coredumpctl:

coredumpctl list

You’ll get a big report of all the core dumps that your system has gone through. You can use the PID that crashed to reference the dump that is specifically yours.

TIME                            PID  UID  GID SIG     COREFILE EXE            SIZE
Sun 2025-04-20 11:02:14 AEST   4775 1000 1000 SIGABRT present  /path/to/vuln  19.4K

Debugging the dump

We can get our hands on these core dumps in a couple of ways.

We can launch gdb directly via coredumpctl, and This will load the crashing binary and the core file into GDB.

coredumpctl gdb 4775

I added the specific failing pid to my command, otherwise this will use the latest coredump.

Inside GDB:

bt              # backtrace
info registers  # cpu state at crash
list            # show source code around crash

Alternatively, if you want a phyical copy of the dump in your local directory you can get our hands on it with this:

coredumpctl dump --output=core.vuln

AFL

Once you’ve had your fun with cat /dev/urandom, it’s worth exploring more sophisticated fuzzers that generate inputs intelligently — like AFL (American Fuzzy Lop).

AFL instruments your binary to trace code coverage and then evolves inputs that explore new paths.

Install

First of all, we need to install afl on our system.

pacman -S afl

Running

Now we can re-compile our executable but this time with AFL’s instrumentation:

afl-cc -g -o vuln-afl vuln.c

Before we can run our test, we need to create an input corpus. We create a minimal set of valid (or near-valid) inputs. AFL will use this input to mutate in other inputs.

mkdir input
echo "AAAA" > input/seed

Before we run, there will be some performance settings that you need to push out to the kernel first.

We need to tell the CPU to run at maximum frequency with the following:

cd /sys/devices/system/cpu
echo performance | tee cpu*/cpufreq/scaling_governor

For more details about these settings, have a look at the CPU frequency scaling documentation.

Now, we run AFL!

mkdir output
afl-fuzz -i input -o output ./vuln-afl

You should now see a live updating dashboard like the following, detailing all of the events that are occuring through the many different runs of your application:

american fuzzy lop ++4.31c {default} (./vuln-afl) [explore]          
┌─ process timing ────────────────────────────────────┬─ overall results ────┐
│        run time : 0 days, 0 hrs, 0 min, 47 sec      │  cycles done : 719   │
│   last new find : none yet (odd, check syntax!)     │ corpus count : 1     │
│last saved crash : none seen yet                     │saved crashes : 0     │
│ last saved hang : none seen yet                     │  saved hangs : 0     │
├─ cycle progress ─────────────────────┬─ map coverage┴──────────────────────┤
│  now processing : 0.2159 (0.0%)      │    map density : 12.50% / 12.50%    │
│  runs timed out : 0 (0.00%)          │ count coverage : 449.00 bits/tuple  │
├─ stage progress ─────────────────────┼─ findings in depth ─────────────────┤
│  now trying : havoc                  │ favored items : 1 (100.00%)         │
│ stage execs : 39/100 (39.00%)        │  new edges on : 1 (100.00%)         │
│ total execs : 215k                   │ total crashes : 0 (0 saved)         │
│  exec speed : 4452/sec               │  total tmouts : 0 (0 saved)         │
├─ fuzzing strategy yields ────────────┴─────────────┬─ item geometry ───────┤
│   bit flips : 0/0, 0/0, 0/0                        │    levels : 1         │
│  byte flips : 0/0, 0/0, 0/0                        │   pending : 0         │
│ arithmetics : 0/0, 0/0, 0/0                        │  pend fav : 0         │
│  known ints : 0/0, 0/0, 0/0                        │ own finds : 0         │
│  dictionary : 0/0, 0/0, 0/0, 0/0                   │  imported : 0         │
│havoc/splice : 0/215k, 0/0                          │ stability : 100.00%   │
│py/custom/rq : unused, unused, unused, unused       ├───────────────────────┘
│    trim/eff : 20.00%/1, n/a                        │          [cpu000: 37%]
└─ strategy: explore ────────── state: started :-) ──

Unlike /dev/urandom, AFL:

  • Uses feedback to mutate inputs intelligently
  • Tracks code coverage
  • Detects crashes, hangs, and timeouts
  • Can auto-reduce inputs that cause crashes

It’s like the /dev/urandom method — but on steroids, with data-driven evolution.

The /output folder will hold all the telemetry from the many runs that AFL is currently performing. Any crashes and hangs are kept later for your inspection. These are just core dumps that you can use again with gdb.

Conclusion

Fuzzing is cheap, dumb, and shockingly effective. If you’re writing C code, run a fuzzer against your tools. You may find bugs that formal tests would never hit — and you’ll learn a lot about your program’s internals in the process.

If you’re interested in going deeper, check out more advanced fuzzers like:

  • AFL (American Fuzzy Lop): coverage-guided fuzzing via input mutation
  • LibFuzzer: fuzzing entry points directly in code
  • Honggfuzz: another smart fuzzer with sanitizer integration
  • AddressSanitizer (ASan): not a fuzzer, but an excellent runtime checker for memory issues

These tools can take you from basic input crashes to deeper vulnerabilities, all without modifying too much of your workflow.

Happy crashing.

Build your own Genetic Algorithm

Introduction

Genetic algorithms (GAs) are one of those wild ideas in computing where the solution isn’t hand-coded — it’s grown.

They borrow inspiration straight from biology. Just like nature evolved eyes, wings, and brains through selection and mutation, we can evolve solutions to problems in software. Not by brute-force guessing, but by letting generations of candidates compete, reproduce, and adapt.

At a high level, a genetic algorithm looks like this:

  1. Create a population of random candidate solutions.
  2. Score each one — how “fit” or useful is it?
  3. Select the best performers.
  4. Breed them together to make the next generation.
  5. Mutate some of them slightly, to add variation.
  6. Repeat until something good enough evolves.

There’s no central intelligence. No clever algorithm trying to find the best answer. Just selection pressure pushing generations toward better solutions — and that’s often enough.

What’s powerful about GAs is that they’re not tied to any specific kind of problem. If you can describe what a “good” answer looks like, even fuzzily, a GA might be able to evolve one. People have used them to:

  • Evolve art or music
  • Solve optimization problems
  • Train strategies for games
  • Design antennas for NASA

In this post, we’re going to build a genetic algorithm from scratch — in pure Python — and show it working on a fun little challenge: evolving a string of text until it spells out "HELLO WORLD".

It might be toy-sized, but the core principles are exactly the same as the big stuff.

Defining the parts

Here, we’ll break down the genetic algorithm idea into simple, solvable parts.

Solution

First of all, we need to define a solution. The solution is what we want to work towards. It can be considered our “chromosome” in this example.

TARGET = "HELLO WORLD"

Every character of this string can then be considered a “gene”.

Defining Fitness

Now we need a function that tells us how fit our individual is, or how close it is to our defined target:

def fitness(individual):
    return sum(1 for i, j in zip(individual, TARGET) if i == j)

Here, we pair each “gene” (char) of the individual and target. We simply count up how many of them match. Higher score means higher fitness.

Populate

We create an initial population to work with, just with some random data. We need to start somewhere, so this is as good as anything.

population = [random_individual() for _ in range(POP_SIZE)]

random_individual will return a string the same size as our solution, but will go with random characters at each index. This provides our starting point.

Genetics

Two key operations give genetic algorithms their evolutionary flavor: crossover and mutation. This is what gives us generations, allowing this algorithm to grow.

Crossover (Recombination)

In biology, crossover happens when two parents create a child: their DNA gets shuffled together. A bit from mum, a bit from dad, spliced at some random point. The child ends up with a new mix of traits, some from each.

We can do exactly that with strings of characters (our “DNA”). Here’s the basic idea:

def crossover(a, b):
    split = random.randint(0, len(a))
    return a[:split] + b[split:]

This picks a random point in the string, then takes the first part from parent a and the second part from parent b. So if a is "HELLOXXXX" and b is "YYYYYWORLD", a crossover might give you "HELLYWORLD". New combinations, new possibilities.

Mutation

Of course, biology isn’t just about inheritance — it also relies on randomness. DNA can get copied imperfectly: a flipped bit, a swapped base. Most mutations are useless. But every once in a while, one’s brilliant.

Same deal in our algorithm:

def mutate(s):
    s = list(s)
    i = random.randint(0, len(s) - 1)
    s[i] = random_char()
    return "".join(s)

This picks a random character in the string and replaces it with a new random one — maybe turning an "X" into a "D", or an "O" into an "E". It adds diversity to the population and prevents us from getting stuck in a rut.

Together, crossover and mutation give us the raw machinery of evolution: recombination and novelty. With just these two tricks, plus a way to score fitness and select the best candidates, we can grow something surprisingly smart from totally random beginnings.

Putting it all together

Now, we just loop on this. We do this over and over, until we land at a solution that marries to our “good solution” that we fed this system with to being with. You can see the “HELLO WORLD” example in action here, and exactly how the algorithm came to its answer:

Gen    0 | Best: RAVY  OTRTK | Score: 2
Gen    1 | Best: RAVY  OTRLH | Score: 3
Gen    2 | Best: LZELORMOHLD | Score: 5
Gen    3 | Best: SFLLORMOHLD | Score: 6
Gen    4 | Best: SFLLO OTRLD | Score: 7
Gen    5 | Best: SFLLO OTRLD | Score: 7
Gen    6 | Best: SFLLO OORLD | Score: 8
Gen    7 | Best: SFLLO MORLD | Score: 8
Gen    8 | Best: SFLLO MORLD | Score: 8
Gen    9 | Best: SFLLO MORLD | Score: 8
Gen   10 | Best: SFLLO MORLD | Score: 8
Gen   11 | Best: SFLLO MORLD | Score: 8
Gen   12 | Best: SFLLO SORLD | Score: 8
Gen   13 | Best: NFLLO MORLD | Score: 8
Gen   14 | Best: SFLLO MORLD | Score: 8
Gen   15 | Best: SFLLO WORLD | Score: 9
Gen   16 | Best: SFLLO WORLD | Score: 9
Gen   17 | Best: HFLLO WORLD | Score: 10
Gen   18 | Best: HFLLO WORLD | Score: 10
Gen   19 | Best: HFLLO WORLD | Score: 10
Gen   20 | Best: HFLLO WORLD | Score: 10
Gen   21 | Best: HFLLO WORLD | Score: 10
Gen   22 | Best: HELLO WORLD | Score: 11

It obviously depends on how your random number generator is feeling, but your mileage will vary.

Code listing

A full code listing of this in action is here:

import random
import string

TARGET = "HELLO WORLD".upper()
POP_SIZE = 100
MUTATION_RATE = 0.01
GENERATIONS = 100000

def random_char():
    return random.choice(string.ascii_uppercase + " ")

def random_individual():
    return ''.join(random_char() for _ in range(len(TARGET)))

def fitness(individual):
    return sum(1 for i, j in zip(individual, TARGET) if i == j)

def mutate(individual):
    return ''.join(
        c if random.random() > MUTATION_RATE else random_char()
        for c in individual
    )

def crossover(a, b):
    split = random.randint(0, len(a) - 1)
    return a[:split] + b[split:]

# Initial population
population = [random_individual() for _ in range(POP_SIZE)]

for generation in range(GENERATIONS):
    scored = [(ind, fitness(ind)) for ind in population]
    scored.sort(key=lambda x: -x[1])

    best = scored[0]
    print(f"Gen {generation:4d} | Best: {best[0]} | Score: {best[1]}")

    if best[1] == len(TARGET):
        print("🎉 Target reached!")
        break

    # Keep top 10 as parents
    parents = [ind for ind, _ in scored[:10]]

    # Make new population
    population = [
        mutate(crossover(random.choice(parents), random.choice(parents)))
        for _ in range(POP_SIZE)
    ]

Conclusion

Genetic algorithms are a beautiful way to turn randomness into results. You don’t need deep math or fancy machine learning models — just a way to measure how good something is, and the patience to let evolution do its thing.

What’s even more exciting is that this toy example uses the exact same principles behind real-world tools that tackle complex problems in scheduling, design, game playing, and even neural network tuning.

If you’re curious to take this further, there are full-featured libraries and frameworks out there built for serious applications:

  • DEAP (Distributed Evolutionary Algorithms in Python) – A flexible framework for evolutionary algorithms. Great for research and custom workflows.
  • PyGAD – Simple, powerful, and easy to use — especially good for optimizing neural networks and functions.
  • ECJ – A Java-based evolutionary computing toolkit used in academia and industry.
  • Jenetics – Another Java library that’s modern, elegant, and geared toward real engineering problems.

These libraries offer more advanced crossover strategies, selection techniques (like tournament or roulette-wheel), and even support for parallel processing or multi-objective optimization.

But even with a simple string-matching example, you’ve now seen how it all works under the hood — survival of the fittest, one generation at a time.

Now go evolve something weird.

Writing A Simple ARM OS - Part 4

Introduction

In Part 3, we explored ARM calling conventions, debugging, and cleaned up our UART driver. While assembly has given us fine control over the hardware, writing an entire OS in assembly would be painful.

It’s time to enter C land.

This post covers:

  • Modifying the bootloader to transition from assembly to C
  • Updating the UART driver to be callable from C
  • Writing our first C function (kmain())
  • Adjusting the Makefile and linker script for C support

Booting into C

We still need a bit of assembly to set up the stack and call kmain(). Let’s start by modifying our bootloader.

Updated bootloader.s

.section .text
.global _start

_start:
    LDR sp, =stack_top  @ Set up the stack
    BL kmain            @ Call the C kernel entry function
    B .                 @ Hang forever

.section .bss
.align 4
stack_top:
.space 1024

What’s changed?

  • We load the stack pointer (sp) before calling kmain(). This ensures C has a valid stack to work with.
  • We branch-and-link (BL) to kmain(), following ARM’s calling conventions.
  • The infinite loop (B .) prevents execution from continuing into unknown memory if kmain() ever returns.

With this setup, execution will jump to kmain()—which we’ll define next.

Our First C Function: kmain()

Now that we can transition from assembly to C, let’s create our first function.

kmain.c

#include "uart.h"

void kmain() {
    uart_puts("Hello from C!\n");
    while (1);
}

What’s happening?

  • We include our uart.h header so we can call uart_puts().
  • kmain() prints "Hello from C!" using our UART driver.
  • The infinite while(1); loop prevents execution from continuing into unknown territory.

At this point, our OS will boot from assembly, call kmain(), and print text using our UART driver—but we need to make a few more changes before this compiles.

Making the UART Driver Callable from C

Right now, uart_puts and uart_putc are assembly functions. To call them from C, we need to:

  1. Ensure they follow the ARM calling convention.
  2. Declare them properly in a header file.

uart.h (Header File)

#ifndef UART_H
#define UART_H

void uart_putc(char c);
void uart_puts(const char *str);

#endif

Updated uart.s

.section .text
.global uart_putc
.global uart_puts

uart_putc:
    PUSH {lr}
    ldr r1, =0x101f1000  @ UART0 Data Register
    STRB r0, [r1]        @ Store byte
    POP {lr}
    BX lr

uart_puts:
    PUSH {lr}

next_char:
    LDRB r1, [r0], #1    @ Load byte from string
    CMP r1, #0
    BEQ done

wait_uart:
    LDR r2, =0x101f1018  @ UART0 Flag Register
    LDR r3, [r2]
    TST r3, #0x20        @ Check if TX FIFO is full
    BNE wait_uart

    LDR r2, =0x101f1000  @ UART0 Data Register
    STR r1, [r2]         @ Write character
    B next_char

done:
    POP {lr}
    BX lr

How this works:

  • Function names are declared .global so they are visible to the linker.
  • uart_putc(char c)
    • Expects a character in r0 (following ARM’s C calling convention).
    • Writes r0 to the UART data register.
  • uart_puts(const char *str)
    • Expects a pointer in r0.
    • Iterates through the string, sending each character until it reaches the null terminator (\0).
  • Preserving Registers
    • PUSH {lr} ensures lr is restored before returning.

Updating the Build System

To compile both assembly and C, we need to adjust the Makefile and linker script.

Updated Makefile

# Makefile for the armos project

# Cross-compiler tools (assuming arm-none-eabi toolchain)
AS = arm-none-eabi-as
CC = arm-none-eabi-gcc
LD = arm-none-eabi-ld
OBJCOPY = arm-none-eabi-objcopy

# Compiler and assembler flags
CFLAGS = -ffreestanding -nostdlib -O2 -Wall -Wextra
ASFLAGS =
LDFLAGS = -T linker.ld -nostdlib

# Files and directories
BUILD_DIR = build
TARGET = armos.elf
BIN_TARGET = armos.bin

# Source files
ASM_SRCS = asm/bootloader.s asm/uart.s
C_SRCS = src/kmain.c
OBJS = $(BUILD_DIR)/bootloader.o $(BUILD_DIR)/uart.o $(BUILD_DIR)/kmain.o

# Build rules
all: $(BUILD_DIR)/$(BIN_TARGET)

$(BUILD_DIR):
	@mkdir -p $(BUILD_DIR)

# Assemble the bootloader and UART
$(BUILD_DIR)/bootloader.o: asm/bootloader.s | $(BUILD_DIR)
	$(AS) $(ASFLAGS) -o $@ $<

$(BUILD_DIR)/uart.o: asm/uart.s | $(BUILD_DIR)
	$(AS) $(ASFLAGS) -o $@ $<

# Compile the C source file
$(BUILD_DIR)/kmain.o: src/kmain.c | $(BUILD_DIR)
	$(CC) $(CFLAGS) -c -o $@ $<

# Link everything together
$(BUILD_DIR)/$(TARGET): $(OBJS)
	$(LD) $(LDFLAGS) -o $@ $(OBJS)

# Convert ELF to binary
$(BUILD_DIR)/$(BIN_TARGET): $(BUILD_DIR)/$(TARGET)
	$(OBJCOPY) -O binary $< $@

# Clean build artifacts
clean:
	rm -rf $(BUILD_DIR)

Updating the Linker Script

The linker script ensures that kmain() and our code are properly loaded in memory.

Updated linker.ld

ENTRY(_start)

SECTIONS
{
    . = 0x10000; /* Load address of the kernel */

    .text : {
        *(.text*)
    }

    .rodata : {
        *(.rodata*)
    }

    .data : {
        *(.data*)
    }

    .bss : {
        *(COMMON)
        *(.bss*)
    }
}

Key Changes:

  • Code starts at 0x10000, ensuring it is loaded correctly.
  • .text, .rodata, .data, and .bss sections are properly defined.

Build

Now that all of these changes in place, we can make our kernel and run it. If everything has gone to plan, you should see our kernel telling us that it’s jumped to C.

Hello from C!

Conclusion

We’ve successfully transitioned from pure assembly to using C for higher-level logic, while keeping low-level hardware interaction in assembly.

The code for this article is available in the GitHub repo.

Getting Started Developing for the LILYGO T-Deck

Introduction

The LILYGO T-Deck is a compact, powerful handheld development device based on the ESP32-S3 microcontroller. It features a 2.8-inch touchscreen, keyboard, trackball, microphone, speaker, and optional LoRa/GPS support, making it ideal for portable embedded systems, IoT applications, and even cybersecurity projects.

T-Deck

In this post, we’ll explore:

  • What the ESP32 microcontroller is.
  • The ESP32-S3 architecture and why it’s powerful.
  • How to set up Arduino IDE for development
  • How to set up ESP-IDF for development
  • Writing and flashing your first ESP-IDF program to print output to the serial monitor.
  • Troubleshooting common setup issues.

What is the ESP32?

The ESP32 is a family of low-cost, low-power system-on-chip (SoC) microcontrollers developed by Espressif Systems. It is widely used for IoT, wireless communication, embedded systems, and AI applications due to its feature-rich architecture.

Some of the key features from the ESP32 are:

  • Dual-core Xtensa LX6 (ESP32) or RISC-V (ESP32-C3/S3) processors.
  • Wi-Fi 802.11 b/g/n and Bluetooth 4.2/5.0 support.
  • Ultra-low power consumption with deep sleep modes.
  • Rich peripherals: SPI, I2C, I2S, UART, ADC, DAC, PWM, and capacitive touch.
  • On-chip SRAM and external PSRAM support.
  • Real-time processing with FreeRTOS.

This alone is an awesome platform to put your development projects together.

ESP32-S3

The ESP32-S3 features a dual-core 32-bit Xtensa LX7 CPU with AI acceleration support and integrated USB, making it ideal for IoT, edge computing, and AI-powered applications.

Development Environment

We need a way to be able to develop software for this chip, so we have some things to install.

You can use a lot of different tools in order to write your software. Each have their own plugins that you can use to get the code flashed onto hardware. I use the Arduino IDE as it’s just simple to use.

Arduino IDE

The quickest way to get started is to follow the steps on the Xinyuan-LilyGO / T-Deck instructions up on GitHub. I’ve summarised those steps here for reference.

First up, get Arduino IDE installed.

Once you’ve got Arduino IDE running, open up “Preferences” to the “Settings” tab. We need to add an additional board manager URL for the ESP32 series of boards: https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json. I had quite a few issues running code that included TFT_eSPI unless I ran version 2.0.14 of these boards.

After this step you should be able to select ESP32S3 Dev Module as your board. This is what we’ll be deploying to.

From the Xinyuan-LilyGO / T-Deck repository, take all of the libraries under the lib folder and copy them into your Arduino libraries folder. You should end up with something similar to this in your Arduino folder:

└── libraries
    ├── AceButton
    ├── Arduino_GFX
    ├── es7210
    ├── ESP32-audioI2S
    ├── lvgl
    ├── RadioLib
    ├── SensorsLib
    ├── TFT_eSPI
    ├── TinyGPSPlus
    └── TouchLib

To finish the configuration, the “Tools” menu should have the following settings:

Setting Value
Board ESP32S3 Dev Module
USB CDC On Boot Enabled
CPU Frequency 240 MHz (WiFi)
Core Debug Level None
USB DFU On Boot Disabled
Erase All Flash Before Sketch Upload Disabled
Events Run On Core 1
Flash Mode QIO 80MHz
Flash Size 16MB (128Mb)
JTAG Adapter Disabled
Arduino Runs On Core 1
USB Firmware MSC On Boot Disabled
Partition Scheme 16M Flash (3MB APP/9.9MB FATFS)
PSRAM OPI PSRAM
Upload Mode UART0 / Hardware CDC
Upload Speed 921600
USB Mode Hardware CDC and JTAG

We should be ready to go now.

First Program

Let’s write some super simple code, just to prove that we’re able to flash this device with the software that we’re writing.

void setup()
{
  Serial.begin(115200);

  delay(1000);
  Serial.println("T-DECK: Setup");
}

void loop()
{
  Serial.println("T-DECK: Loop");
  delay(1000);
}

The functions setup and loop should be very familiar to anyone who has written Arduino code.

The setup function is executed once, at the start. It’s normally used to set the board up. The loop function is executed repeatedly from there, until the board is turned off.

To setup, we use Serial.begin to set the rate of data for serial transmission. A delay is used to let the board settle, and then we write our first message out.

Our loop simply writes the T-DECK: Loop string once every second.

You should see something like this in your serial monitor:

T-DECK: Setup
T-DECK: Loop
T-DECK: Loop
T-DECK: Loop
T-DECK: Loop

Arduino-ESP32 is ideal for newcomers and hobby project as it’s quite simple to get running and just generally has a lower barrier to entry. You can get basic applications achieved quickly.

ESP-IDF

To unlock more power of your board, ESP-IDF (the Espressif IoT Development Framework) is available. ESP-IDF allows you to break out of the setup() and loop() structures and allows you to write task-based applications.

You’ll get some better debugging and error handling, it is FreeRTOS-based, and you’ll also get immediate updates and bug fixes.

The process to get up and running can vary depending on the chip that you’re developing for. Espressif have pretty good documentation on their site with the Getting Started guide being available for all of their chip sets.

ESP32S3 which is what I’m using is really easy to get started with.

Dependencies

First are some operating system dependencies. As above, I’m on Arch Linux so the following dependencies are what I needed:

sudo pacman -S --needed gcc git make flex bison gperf python cmake ninja ccache dfu-util libusb

ESP-IDF

The installation of ESP-IDF is quite simple. It’s just grabbing their github repository at a given version into a well known directory on your machine:

cd ~
git clone -b v5.2.5 --recursive https://github.com/espressif/esp-idf.git

Tools

You can now use install.sh bundled with the github repository to install any extra tooling required for your board.

cd ~/esp-idf
./install.sh esp32s3

Integration

Finally, you’re going to need a way to drop into the ESP-IDF environment whenever you want. You can always just remember to do this anytime you want to do any development; but I prefer to make an alias in my ~/.zshrc file.

alias get_idf='source $HOME/esp-idf/export.sh'

Now, anytime I want to drop into that environment; I simply issue get_idf at the shell.

Ready

You’re just about ready to start development. So, let’s start a new project.

Get a copy of the hello_world example from the ~/esp-idf/examples/get-started folder, and put it into your source folder somewhere (where ever you normally work from):

cp -r ~/esp-idf/examples/get-started/hello_world ~/src/tmp/hw

cd ~/src/tmp/hw

Code

Let’s take a quick look at the hello world example code:

void app_main(void)
{
    printf("Hello world!\n");

    /* Print chip information */
    esp_chip_info_t chip_info;
    uint32_t flash_size;
    esp_chip_info(&chip_info);
    printf("This is %s chip with %d CPU core(s), %s%s%s%s, ",
           CONFIG_IDF_TARGET,
           chip_info.cores,
           (chip_info.features & CHIP_FEATURE_WIFI_BGN) ? "WiFi/" : "",
           (chip_info.features & CHIP_FEATURE_BT) ? "BT" : "",
           (chip_info.features & CHIP_FEATURE_BLE) ? "BLE" : "",
           (chip_info.features & CHIP_FEATURE_IEEE802154) ? ", 802.15.4 (Zigbee/Thread)" : "");

    unsigned major_rev = chip_info.revision / 100;
    unsigned minor_rev = chip_info.revision % 100;
    printf("silicon revision v%d.%d, ", major_rev, minor_rev);
    if(esp_flash_get_size(NULL, &flash_size) != ESP_OK) {
        printf("Get flash size failed");
        return;
    }

    printf("%" PRIu32 "MB %s flash\n", flash_size / (uint32_t)(1024 * 1024),
           (chip_info.features & CHIP_FEATURE_EMB_FLASH) ? "embedded" : "external");

    printf("Minimum free heap size: %" PRIu32 " bytes\n", esp_get_minimum_free_heap_size());

    for (int i = 10; i >= 0; i--) {
        printf("Restarting in %d seconds...\n", i);
        vTaskDelay(1000 / portTICK_PERIOD_MS);
    }
    printf("Restarting now.\n");
    fflush(stdout);
    esp_restart();
}
  • We’re printing "Hello world!"
  • We gather and print some chipset information
  • We gather and print some memory information
  • We countdown from 10, and restart

This program will continue in a loop, restarting the device.

Running

Connect your device to the machine now. When I connect mine, it uses /dev/tty:

ls /dev/tty*

/dev/ttyACM0

You’ll need to find yours on your machine, as you’ll use this reference to flash software onto.

Configure

idf.py set-target esp32s3
idf.py menuconfig

The set-target step will setup the necessary configurations for that specific board type. The menuconfig step will allow you to customise any of those configs. I’ve always been fine to leave those configs, save and quit menuconfig.

Build

Now we can build.

idf.py build

After a bit of console scrolling, you should be left with some completion notes:

Executing action: all (aliases: build)
Running make in directory /home/michael/src/tmp/hw/build
Executing "make -j 10 all"...
[  0%] Built target memory.ld
[  0%] Built target sections.ld.in

. . .
. . . lots of text here
. . .

[100%] Built target hello_world.elf
[100%] Built target gen_project_binary
hello_world.bin binary size 0x2bd40 bytes. Smallest app partition is 0x100000 bytes. 0xd42c0 bytes (83%) free.
[100%] Built target app_check_size
[100%] Built target app

Project build complete. To flash, run:
idf.py flash
or
idf.py -p PORT flash
or
python -m esptool --chip esp32s3 -b 460800 --before default_reset --after hard_reset write_flash --flash_mode dio --flash_size 2MB --flash_freq 80m 0x0 build/bootloader/bootloader.bin 0x8000 build/partition_table/partition-table.bin 0x10000 build/hello_world.bin
or from the "/home/michael/src/tmp/hw/build" directory
python -m esptool --chip esp32s3 -b 460800 --before default_reset --after hard_reset write_flash "@flash_args"

Now we can flash this onto our device.

idf.py -p /dev/ttyACM0 flash

Your device should now be running your software.

You can confirm this (for this particular program) by monitoring the serial output:

idf.py -p /dev/ttyACM0 monitor

You should see some output like this:

This is esp32s3 chip with 2 CPU core(s), WiFi/BLE, silicon revision v0.2, 2MB external flash
Minimum free heap size: 393180 bytes
Restarting in 10 seconds...
Restarting in 9 seconds...
Restarting in 8 seconds...
Restarting in 7 seconds...
Restarting in 6 seconds...
Restarting in 5 seconds...
Restarting in 4 seconds...
Restarting in 3 seconds...
Restarting in 2 seconds...
Restarting in 1 seconds...
Restarting in 0 seconds...
Restarting now.
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021

. . .
. . . lots of text here
. . . 

As we saw when we looked through the code, this is exactly what was expected.

Conclusion

We’ve explored two different ways to set up and develop software for ESP32-based chips: Arduino-ESP32 for quick prototyping and ESP-IDF for professional-grade development. The LILYGO T-Deck, with its touchscreen, keyboard, and connectivity options, makes an excellent platform for embedded applications, whether you’re experimenting with IoT, cybersecurity tools, or custom handheld devices.

If you’re new to embedded development, starting with Arduino-ESP32 is a great way to get familiar with the hardware. But to unlock the full power of the ESP32-S3, including multi-threading, advanced debugging, and FreeRTOS integration, consider diving deeper into ESP-IDF.

I hope to use the information in this article as a base platform for writing more posts in the future.