In Part 3, we explored ARM calling conventions, debugging,
and cleaned up our UART driver. While assembly has given us fine control over the hardware, writing an entire OS in
assembly would be painful.
It’s time to enter C land.
This post covers:
Modifying the bootloader to transition from assembly to C
Updating the UART driver to be callable from C
Writing our first C function (kmain())
Adjusting the Makefile and linker script for C support
Booting into C
We still need a bit of assembly to set up the stack and call kmain(). Let’s start by modifying our bootloader.
Updated bootloader.s
.section .text
.global _start
_start:
LDR sp, =stack_top @ Set up the stack
BL kmain @ Call the C kernel entry function
B . @ Hang forever
.section .bss
.align 4
stack_top:
.space 1024
What’s changed?
We load the stack pointer (sp) before calling kmain(). This ensures C has a valid stack to work with.
We branch-and-link (BL) to kmain(), following ARM’s calling conventions.
The infinite loop (B .) prevents execution from continuing into unknown memory if kmain() ever returns.
With this setup, execution will jump to kmain()—which we’ll define next.
Our First C Function: kmain()
Now that we can transition from assembly to C, let’s create our first function.
kmain.c
#include"uart.h"voidkmain(){uart_puts("Hello from C!\n");while(1);}
What’s happening?
We include our uart.h header so we can call uart_puts().
kmain() prints "Hello from C!" using our UART driver.
The infinite while(1); loop prevents execution from continuing into unknown territory.
At this point, our OS will boot from assembly, call kmain(), and print text using our UART driver—but we need to
make a few more changes before this compiles.
Making the UART Driver Callable from C
Right now, uart_puts and uart_putc are assembly functions. To call them from C, we need to:
Code starts at 0x10000, ensuring it is loaded correctly.
.text, .rodata, .data, and .bss sections are properly defined.
Build
Now that all of these changes in place, we can make our kernel and run it. If everything has gone to plan, you should
see our kernel telling us that it’s jumped to C.
Hello from C!
Conclusion
We’ve successfully transitioned from pure assembly to using C for higher-level logic, while keeping low-level
hardware interaction in assembly.
The code for this article is available in the GitHub repo.
In Part 2, we implemented a basic UART driver to send text
output from our OS. We put some functions together that deal with single characters and strings.
Today’s article has a focus on some calling conventions, debugging, and hopefully making that UART driver a little
more robust.
Let’s take a hot lap of the ARM architecture first.
Registers
ARM processors have a well-defined set of registers, which can be categorized based on their usage. Below is a
breakdown of all the ARM registers, including their general purpose, special purpose, and system control registers.
General purpose registers
These registers are used for storing temporary values and passing function arguments.
Register
Purpose
r0–r11
Function arguments & return values (general purpose)
r12
Intra-procedure scratch register (IP, sometimes used as a temporary register)
Note: ARM conventions say that r0-r3 and r12 can be freely modified by functions, and the caller must save them if needed. r4-r11 should be restored by the function before returning.
Special purpose registers
These registers serve specific roles in function calls, memory access, and system control.
Register
Purpose
r13sp
(Stack Pointer) Points to the top of the current stack
r14lr
(Link Register) Stores return address for function calls
r15pc
(Program Counter) Holds the address of the next instruction to execute
lr (r14) is used for storing return addresses when calling functions.
pc (r15) is automatically incremented as instructions execute. You can branch by writing directly to pc.
Program Status Registers
The Current Program Status Register (CPSR) and Saved Program Status Register (SPSR) store flags and mode-related information.
Register
Purpose
CPSR
Holds flags, processor mode, and interrupt status
SPSR
Stores CPSR when entering an exception mode
Key CPSR flags (Condition Flags):
N (Negative) – Set if the result of an operation is negative.
Z (Zero) – Set if the result of an operation is zero.
C (Carry/Borrow) – Set if an operation results in a carry/borrow.
V (Overflow) – Set if an arithmetic operation overflows.
Processor Mode Bits (M[4:0]):
0b10000 – User mode
0b10001 – FIQ (Fast Interrupt) mode
0b10010 – IRQ (Normal Interrupt) mode
0b10011 – Supervisor mode
0b10111 – Abort mode
0b11011 – Undefined instruction mode
0b11111 – System mode (privileged user mode)
Banked Registers (Mode-Specific Registers)
ARM has banked registers that are only accessible in specific processor modes (e.g., IRQ, FIQ). These registers allow
fast context switching between different execution states.
Mode
Extra Registers Available
FIQ Mode
r8_fiq – r14_fiq (separate registers for FIQ context)
IRQ Mode
r13_irq, r14_irq (separate SP and LR for IRQ)
Supervisor Mode
r13_svc, r14_svc (separate SP and LR for SVC)
Abort Mode
r13_abt, r14_abt
Undefined Mode
r13_und, r14_und
Why banked registers?
Interrupt handlers can run without disturbing normal user-space registers.
Faster execution because it eliminates the need to save/restore shared registers.
Debug Registers (ARMv7+)
ARM processors often include special registers for debugging, including breakpoints and watchpoints.
Register
Purpose
DBGDSCR
Debug Status and Control Register
DBGBVR
Breakpoint Value Register
DBGBCR
Breakpoint Control Register
DBGWVR
Watchpoint Value Register
DBGWCR
Watchpoint Control Register
Understanding ARM Calling Conventions
ARM assembly follows a convention for passing function parameters and preserving registers:
Caller-saved registers (r0-r3, r12): These are freely used by functions and must be saved by the caller if needed.
Callee-saved registers (r4-r11, lr): A function must preserve and restore these if it modifies them.
Return values: r0 holds the return value.
Understanding this is key to writing reliable functions.
Upgrading uart_puts
We’re going to upgrade our uart_puts to “behave” a little nicer for us.
uart_puts:
push {lr} @ Save return address
next_char:
ldrb r1, [r0], #1 @ Load byte from string and increment pointer
cmp r1, #0 @ Check if null terminator
beq done @ If so, return
wait_uart:
ldr r2, =UART0_FR @ Load address of UART flag register
ldr r3, [r2] @ Read UART flag register
tst r3, #TXFF @ Check if TX FIFO is full
bne wait_uart @ If full, wait
ldr r2, =UART0_DR @ Load address of UART data register
str r1, [r2] @ Write character to UART
b next_char @ Process next character
done:
pop {lr} @ Restore return address
bx lr @ Return
Let’s break this down piece by piece.
We save off lr (the link register) which is our return address from where we were called.
uart_puts:
push {lr} @ Save return address
ldrb takes the next source byte from our string, and we check if we’re finished. next_char is the loop point that
we come back to, to process the remainder of the string.
next_char:
ldrb r1, [r0], #1 @ Load byte from string and increment pointer
cmp r1, #0 @ Check if null terminator
beq done @ If so, return
Next we wait for the UART buffer in case it’s full
wait_uart:
ldr r2, =UART0_FR @ Load address of UART flag register
ldr r3, [r2] @ Read UART flag register
tst r3, #TXFF @ Check if TX FIFO is full
bne wait_uart @ If full, wait
Using str we write that source byte out to the UART data register, and continue to the next character in the loop.
ldr r2, =UART0_DR @ Load address of UART data register
str r1, [r2] @ Write character to UART
b next_char @ Process next character
We finish up by restoring lr before returning to where we were called from.
done:
pop {lr} @ Restore return address
bx lr @ Return
Debugging ARM Assembly
Debugging low-level assembly can be challenging. Here are some useful techniques to diagnose function issues.
Print Debug Markers
One of the simplest ways to trace execution is to print special debug characters in the UART output:
MOV r0, #'!'
BL uart_putc @ Print a debug marker to trace execution
If your code stops working after a certain point, inserting markers helps pinpoint where execution breaks.
Step Through Execution in QEMU
QEMU provides debugging features that let us step through execution. Start QEMU with GDB support:
In the first article of this series, we built a basic ARM
bootloader and ran it in QEMU. However, debugging without output can be frustrating. In this part, we’ll set up
UART
(Universal Asynchronous Receiver-Transmitter) to send simple text messages from our OS.
UART is a fundamental interface used for serial communication in embedded systems. It allows us to
send and receive characters over a hardware interface, making it useful for early-stage debugging before more
complex peripherals like displays or networking are available.
By the end of this article, we’ll have a basic UART driver that prints text to the terminal using QEMU.
What is UART?
UART is a hardware component that enables serial communication by sending and receiving data one bit at a time. It
typically operates over two wires:
TX (Transmit): Sends data.
RX (Receive): Receives data.
In most ARM-based systems, UART is memory-mapped, meaning we can control it by writing to specific memory addresses.
Configuring UART in QEMU
We’ll use PL011 UART, a common ARM serial interface. QEMU
provides an emulated PL011 UART at a known memory address, allowing us to send text output to the terminal.
To enable UART output, we need to run QEMU with the following command:
-serial stdio redirects UART output to our terminal.
-nographic ensures we run without a graphical display.
You may receive an error message like the following:
qemu-system-arm: -serial stdio: cannot use stdio by multiple character devices
qemu-system-arm: -serial stdio: could not connect serial device to character backend 'stdio'
This is just telling you that you’re already redirecting the serial output because of the -nographic switch. If you
do see this, you’re free to simply drop the -serial stdio.
With this setup, once we implement our UART driver, we’ll see printed text appear in the terminal.
Writing a UART Driver
PL011 UART Memory-Mapped Registers
The PL011 UART controller is accessible via memory-mapped I/O registers. The key register we need for output is at
address 0x101f1000 and is called the UART Data Register (DR). Writing a byte to this register sends a character
over UART.
Implementing UART Functions
We create a new file, uart.s, in the asm/ directory:
.equ UART0_DR, 0x101f1000
.section .text
.global uart_putc
.global uart_puts
uart_putc:
STRB r0, [r1] @ Store byte from r0 into UART data register
BX lr
uart_puts:
LDR r1, =UART0_DR
1:
LDRB r0, [r2], #1 @ Load byte from string, increment pointer
CMP r0, #0 @ Check for null terminator
BEQ 2f @ Branch to 2 if we are done
BL uart_putc @ If not, call putc
B 1b @ Keep looping
2:
BX lr @ Return to caller
uart_putc(char): Sends a single character to the UART register.
uart_puts(string): Iterates through a null-terminated string and sends each character.
Printing a Message from the Bootloader
Now that we have a UART driver, we can modify our bootloader (bootstrap.s) to print a message.
We now have basic UART output working in our OS! This is a critical step because it allows us to debug our OS by
printing messages. As always you find the code up in my GitHub
repository.
With this foundational work in place, we’re one step closer to a functional ARM-based OS. Stay tuned for Part 3!
In this series, we’ll build a small operating system for the ARM platform from the ground up. Along the way, we’ll
explore fundamental OS concepts and incrementally add components, turning abstract ideas into working code. Each
article will focus on a specific piece of the system, guiding you through the process step by step.
We’re going to use QEMU, an open-source emulator, so we can develop and test our code directly on a PC—no hardware
required (for now).
In Part 1, we’ll first discuss ARM, then move on to the following:
ARM, short for Advanced RISC Machine, is a family of Reduced Instruction Set Computing (RISC) architectures that power
billions of devices, from smartphones and tablets to embedded systems and IoT devices. Originally known as Acorn RISC
Machine, ARM has become a cornerstone of modern computing due to its energy efficiency and simplicity compared to
Complex Instruction Set Computing (CISC) architectures like x86. Designed around the RISC philosophy, ARM processors
use a small, highly optimized instruction set, enabling greater performance per watt and making them ideal for
low-power and mobile environments.
Why Emulation?
While ARM assembly is usually executed on physical devices, emulation tools like QEMU allow
you to:
Test code without requiring hardware.
Experiment with different ARM-based architectures and peripherals.
Debug programs more effectively using tools like GDB.
Supported ARM Hardware
Before we begin coding, let’s take a brief look at some popular ARM-based platforms:
Raspberry Pi: A widely used single-board computer.
BeagleBone Black: A powerful option for embedded projects.
STM32 Microcontrollers: Common in IoT and robotics applications.
Installing prerequisites
Before we begin, we need to setup our development and build environment. I’m using Manjaro so package
names might be slightly different for your distro of choice.
To build our software, we’ll install the arm-none-eabi toolchain, which provides the assembler (as), linker (ld),
and other essential utilities.
Now that our project structure is in place, we can begin writing our first piece of assembly code: the bootloader.
If we add bootstrap.s to the asm folder we can make a start on the bootloader.
.section .text
.global _start
_start:
LDR sp, =stack_top @ initialize the stack pointer
BL kernel_main @ jump to the kernel main loop
kernel_main:
1: B 1b @ infinite loop to keep the OS running
B . @ fallback loop
.section .bss
.align 4
stack_top:
.space 1024 @ allocate 1kb for the stack
This is a pretty basic module to begin with. At the start we define our code with a .text section and _start is a
global symbol:
.section .text
.global _start
Next, we setup our stack pointer sp by loading the address of our stack_top. The equal sign preceeding stack_top
tells the assembler to load the immediate value of the address. We have stack_top defined down a little further.
Then, we jump on to our kernel.
Interesting note, BL which is Branch with Link works very much like a branch (B) but it will store the address
from where we branched into the link register r14.
_start:
LDR sp, =stack_top @ initialize the stack pointer
BL kernel_main @ jump to the kernel main loop
Now we have two endless loops setup. The first one loops back to the 1: loop:
1: B 1b @ infinite loop to keep the OS running
If we do get an unexpected address sneak in for whatever reason, we’ve got a fallback loop that continually jumps to
itself using the shorthand ..
B . @ fallback loop
Finally, we complete the module by defining our stack with the .bss section. You’ll notice the stack_top label that
we referenced earlier.
.section .bss
.align 4
stack_top:
.space 1024 @ allocate 1kb for the stack
Build environment
We need to make this easy to build, so we create a Makefile in the root directory. The Makefile will use the
toolchain that we installed earlier, building our binaries into the build folder:
Our call out to our assembler is pretty straight forward, trading our .s files for .o object files. We use
-Ttext 0x0 to explicitly tell the linker that our program should start at address 0x0, which is necessary for
bare-metal environments.
Give it a build.
make
All going well you should see some output as follows:
-nographic runs qemu without a graphical user interface
If everything works, you won’t see much—your bootloader is running in an infinite loop, waiting for further development.
Debugging
Because we are running in a virtualised environment, we have a full debugger at our disposal. Having a debugger attached
to your code when things aren’t quite going to plan can be very valuable to understand what’s happening in the internals
of your program.
Using the -gdb option, you can instruct qemu to open a debugging port.
For deployment, we’ll use a Raspberry Pi as an example. This process is similar for other ARM-based boards.
Flashing
First, we need to convert the ELF file to a raw binary format suitable for booting:
arm-none-eabi-objcopy -O binary boot.elf boot.bin
Use a tool like dd to write the binary to an SD card:
Caution: Be very careful with the dd command! Double-check /dev/sdX before running it to avoid overwriting important data.
dd if=boot.bin of=/dev/sdX bs=512 seek=2048
Conclusion
In this post, we’ve built the foundation for armos. We’ve installed and configured the ARM toolchain, set up QEMU
to emulate our target board, organized our project directory for clarity and scalability, and even wrote a simple
bootloader to jumpstart our operating system. With these critical components in place, you’re now ready to embark on
the next steps—enhancing the bootloader, adding essential kernel functionalities, and ultimately constructing a
full-fledged minimalist OS.
In this update, we’re introducing first-class functions to our Lisp interpreter! This allows users to define and
use functions just like they would in a real Lisp environment.
Before this update, function definitions looked like this:
If define receives a list as the first argument, it assumes we’re defining a function.
The first element of that list (funcName) is the function name.
The remaining elements (params) are the parameter names.
The function body is stored as a lambda expression.
Example
With this change, both of these definitions are now equivalent:
;; Using explicit lambda(definesquare(lambda(x)(*xx)));; Using shorthand function definition(define(squarex)(*xx))(square4);; 16
This makes function definitions much cleaner and easier to read!
Fixing if Expressions
While making these changes, we discovered a bug in our if implementation.
Previously, ifdid not evaluate the then or else branch before returning it:
evalenv(List[Atom"if",condition,thenExpr,elseExpr])=doresult<-evalenvconditioncaseresultofBoolTrue->returnthenExprBoolFalse->returnelseExpr_->throwError$TypeMismatch"Expected boolean in if condition"result
This meant that calling:
(if(>105)(*22)(*33))
Would return(* 2 2) instead of 4!
The Fix
We simply added eval env to ensure the chosen branch is evaluated before being returned:
evalenv(List[Atom"if",condition,thenExpr,elseExpr])=doresult<-evalenvconditioncaseresultofBoolTrue->evalenvthenExprBoolFalse->evalenvelseExpr_->throwError$TypeMismatch"Expected boolean in if condition"result
Now, if expressions behave correctly:
(if(>105)(*22)(*33));; 4(if(<105)(*22)(*33));; 9
Recursion
With function definitions and the fixed if statement, we can now write recursive functions!