Build your own x86 Kernel Part 4
05 Oct 2025Introduction
In Part 3 finalised out boot loader, so that it now successfully loads Stage 2 for us. In this post, we’ll focus on setting the system so that we unlock the more advanced features.
Inside of Stage 2 we’ll look at setting up the following:
- Enable the A20 line
- Set up a Global Descriptor Table (GDT)
- Switch to 32-bit Protected Mode
By the end of this article, we’ll at least be in 32-bit protected mode.
A20 Line
Before we can enter 32-bit protected mode, we need to enable the A20 line.
Back in the original Intel 8086, there were only 20 address lines — A0 through A19 — meaning it could address
1 MiB of memory (from 0x00000
to 0xFFFFF
). When Intel introduced the 80286, it gained more address lines
and could access memory above 1 MiB. However, to remain compatible with older DOS software that relied on address
wrap-around (where 0xFFFFF + 1
rolled back to 0x00000
), IBM added a hardware gate: A20.
When the A20 line is disabled, physical address bit 20 is forced to 0.
So addresses “wrap” every 1 MiB — 0x100000
looks the same as 0x000000
.
When A20 is enabled, memory above 1 MiB becomes accessible. Protected-mode code, paging, and modern kernels all assume that A20 is on.
Enabling A20
To enable the A20 Line, we use the Fast A20 Gate (port 0x92
).
Most modern systems and emulators expose bit 1 of port 0x92
(the “System Control Port A”) as a direct A20 enable bit.
- Bit 0 — system reset (don’t touch this)
- Bit 1 — A20 gate (1 = enabled)
We add the following to do this:
%define A20_GATE 0x92
in al, A20_GATE ; read system control port A
or al, 0x02 ; set bit 1 (A20 enable)
and al, 0xFE ; clear bit 0 (reset)
out A20_GATE, al
Global Descriptor Table (GDT)
When the CPU is in real mode, memory addressing is done through segment:offset pairs. Each segment register
(CS
, DS
, SS
, etc.) represents a base address (shifted left by 4), and the offset is added to that. This gives you
access to 1 MiB of address space — the legacy 8086 model.
When we switch to protected mode, the segmentation model changes. Instead of using raw segment values, each segment register now holds a selector — an index into a table called the Global Descriptor Table (GDT).
The GDT tells the CPU what each segment means:
- Its base address
- size (limit)
- access rights
- flags like “code or data”, “read/write”, or “privilege level”
The descriptor layout in 32-bit mode looks like this:
Bits | Field | Description |
---|---|---|
0-15 | Limit (low) | Segment limit (low 16 bits) |
16-31 | Base (low) | Segment base address (low 16 bits) |
32-39 | Base (mid) | Segment base (middle 8 bits) |
40-47 | Access Byte | Type, privilege level, presence |
48-51 | Limit (high) | High 4 bits of segment limit |
52-55 | Flags | Granularity, 32-bit flag, etc. |
56-63 | Base (high) | Segment base (high 8 bits) |
In our boot setup, we’ll create a very simple GDT with:
- A null descriptor (required; selector 0 is invalid by design).
- A code segment descriptor — flat 4 GiB region, readable, executable.
- A data segment descriptor — flat 4 GiB region, readable, writable.
This gives us a flat memory model, where all segments start at base 0 and cover the entire address space. That makes protected mode addressing behave almost like real mode linear memory, simplifying everything until paging and virtual memory come later.
Once that GDT is loaded with lgdt
, we can safely set the PE (Protection Enable) bit in CR0
and perform a far jump
into 32-bit protected mode code.
Defining the GDT
We define our GDT as three quad words. One for null, one for code, and one for data.
align 8
; --- GDT for entering 32-bit PM (null, code, data) ---
gdt32:
dq 0x0000000000000000 ; null
dq 0x00CF9A000000FFFF ; 0x08: 32-bit code, base=0, limit=4GiB
dq 0x00CF92000000FFFF ; 0x10: 32-bit data, base=0, limit=4GiB
gdt32_desc:
dw gdt32_end - gdt32 - 1 ; limit = (size of GDT - 1)
dd gdt32 ; base = address of GDT
gdt32_end:
Breaking down the 32-bit code GDT:
0x00CF9A000000FFFF
If we split this into bytes (little-endian in memory):
[FFFF] [0000] [00][9A] [CF][00]
We can now start to map these to the fields:
Field | Value | Meaning |
---|---|---|
Limit (low 16) | 0xFFFF |
segment limit = 0xFFFF |
Base (low 16) | 0x0000 |
base = 0x00000000 |
Base (mid 8) | 0x00 |
base = 0x00000000 |
Access Byte | 0x9A |
flags that define “code, ring 0, present” |
Limit (high 4) + flags | 0xCF |
limit high nibble=0xF, flags=0xC |
Base (high 8) | 0x00 |
base = 0x00000000 |
The “limit” of 0xFFFF
and granularity bit (G=1
) combine to make the segment effectively 4 GiB in size
(0xFFFFF × 4 KiB pages = 4 GiB).
Loading the GDT
Now that we have our GDT defined, we can use lgdt
to load it.
cli
lgdt [gdt32_desc]
mov eax, cr0
or eax, 1 ; CR0.PE=1
mov cr0, eax
The operand to lgdt
wants to see a 16bit limit first, and then a 32-bit linear address (in 32-bit mode) to where the
GDT starts.
Protected Mode
With the GDT now loaded, we’re free to push over to protected mode. This is 32-bit protected mode, so we’re jumping into
code that needs the [BITS 32]
directive.
; selectors: 0x08 = code32, 0x10 = data32
jmp 0x08:pm_entry ; far jump to load 32-bit CS
[BITS 32]
pm_entry:
mov ax, 0x10 ; 0x10 = data32
mov ds, ax
mov es, ax
mov ss, ax
mov fs, ax
mov gs, ax
mov esp, 0x90000 ; temporary 32-bit stack
.hang:
hlt
jmp .hang
We make our far jump into 32-bit land. This jump both updates CS
and flushes the prefetch
queue — it’s the required way to officially enter protected mode.
Immediately we set all of our segment selectors to 0x10
which is data GDT entry.
We’re now in 32-bit protected mode.
Stage 2 (full listing)
Our current code for Stage 2 now looks like this:
; ---------------------------------------------------------
; boot/stage2.asm — loaded by MBR at 0000:8000 (LBA 1..16)
; ---------------------------------------------------------
BITS 16
ORG 0x8000
%define A20_GATE 0x92
start2:
cli
xor ax, ax
mov ds, ax ; ds = 0 so labels assembled with ORG work as absolute
mov es, ax
cld ; count upwards
sti
call serial_init
mov si, stage2_msg
call serial_puts
in al, A20_GATE ; A20 fast
or al, 0x02
and al, 0xFE
out A20_GATE, al
mov si, a20_msg
call serial_puts
cli
lgdt [gdt32_desc]
mov eax, cr0
or eax, 1 ; CR0.PE=1
mov cr0, eax
mov si, gdt_msg
call serial_puts
; selectors: 0x08 = code32, 0x10 = data32
jmp 0x08:pm_entry ; far jump to load 32-bit CS
stage2_msg db "Stage2: OK", 13, 10, 0
a20_msg db "A20 Line: Enabled", 13, 10, 0
gdt_msg db "GDT: Loaded", 13, 10, 0
%include "boot/serial16.asm"
[BITS 32]
pm_entry:
mov ax, 0x10 ; 0x10 = data32
mov ds, ax
mov es, ax
mov ss, ax
mov fs, ax
mov gs, ax
mov esp, 0x90000 ; temporary 32-bit stack
mov esi, pm_msg
call serial_puts32
.hang:
hlt
jmp .hang
align 8
; --- GDT for entering 32-bit PM (null, code, data) ---
gdt32:
dq 0x0000000000000000 ; null
dq 0x00CF9A000000FFFF ; 0x08: 32-bit code, base=0, limit=4GiB
dq 0x00CF92000000FFFF ; 0x10: 32-bit data, base=0, limit=4GiB
gdt32_desc:
dw gdt32_end - gdt32 - 1
dd gdt32
gdt32_end:
pm_msg db "Entered protected mode ...", 13, 10, 0
%include "boot/serial32.asm"
Notes
I’ve had to duplicate the serial assembly file. Originally it was 16 bits only, but now we need 32-bit support.
These routines look alot like their 16-bit counterparts:
; ---------------------------------------------------------
; serial32.asm — COM1 (0x3F8) UART helpers for 32-bit PM
; ---------------------------------------------------------
[BITS 32]
%define COM1 0x3F8
; LSR bits: 0x20 = THR empty, 0x40 = TSR empty
; init: 115200 8N1, FIFO on
serial_init32:
push eax
push edx
; IER=0 (disable UART interrupts)
mov dx, COM1 + 1
xor eax, eax
out dx, al
; DLAB=1
mov dx, COM1 + 3
mov al, 0x80
out dx, al
; divisor = 1 (DLL=1, DLM=0)
mov dx, COM1 + 0
mov al, 0x01
out dx, al
mov dx, COM1 + 1
xor al, al
out dx, al
; 8N1, DLAB=0
mov dx, COM1 + 3
mov al, 0x03
out dx, al
; FIFO enable/clear, 14-byte trigger
mov dx, COM1 + 2
mov al, 0xC7
out dx, al
; MCR: DTR|RTS|OUT2
mov dx, COM1 + 4
mov al, 0x0B
out dx, al
pop edx
pop eax
ret
; wait until THR empty
serial_wait_tx32:
push eax
push edx
mov dx, COM1 + 5
.wait:
in al, dx
test al, 0x20
jz .wait
pop edx
pop eax
ret
; putc: AL = character
serial_putc32:
push edx
call serial_wait_tx32
mov dx, COM1
out dx, al
pop edx
ret
; putc with '\n' -> "\r\n"
serial_putc_nl32:
cmp al, 10 ; '\n'
jne .send
push eax
mov al, 13 ; '\r'
call serial_putc32
pop eax
.send:
jmp serial_putc32
; puts: ESI -> zero-terminated string
serial_puts32:
push eax
push esi
.next:
lodsb ; AL = [ESI], ESI++
test al, al
jz .done
call serial_putc_nl32
jmp .next
.done:
pop esi
pop eax
ret
Running
Getting this built and running now, we can see that we’re successfully in 32-bit protected mode.
➜ make run
qemu-system-x86_64 -drive file=os.img,format=raw,if=ide,media=disk -serial stdio -debugcon file:debug.log -global isa-debugcon.iobase=0xe9 -display none -no-reboot -no-shutdown -d guest_errors,cpu_reset -D qemu.log
Booting ...
Starting Stage2 ...
Stage2: OK
A20 Line: Enabled
GDT: Loaded
Entered protected mode ...
Conclusion
We’ve now built the minimal foundation of a protected-mode operating system: flat memory model, GDT, and a working serial console. From this point on, we can start using true 32-bit instructions and data structures. In the next post, we’ll extend this with an Interrupt Descriptor Table (IDT), Programmable Interrupt Timer (PIT), and paging, preparing the system for 64-bit long mode.