16 bit COM files
26 Mar 2015COM files are plain binary executable file format from the MS-DOS era (and before!) that provide a very simple execution model.
The execution environment is given one 64kb segment to fit its code, stack and data segments into. This memory model is sometimes referred to as the “tiny” model.
In today’s post, we’re going to write a really simple program; compile it, disassemble it and dissect it. Here’s our program that very helpfully prints “Hello, world!” to the console and then exits.
Nothing of great interest here. The only thing worth a mention is the ORG
directive. This tells the assembler (and therefore the execution environment once executed) that our program starts at the offset 100h
. There’s some more information regarding 16bit programs with nasm here.
nasm’s default output format is plain binary so, assembly is very simple:
Running our program in dosbox and we’re given our prompt as promised. Taking a look at the binary on disk, it’s seriously small. 24 bytes small. We won’t have much to read when we dissassemble it!
Because this is a plain binary file, we need to give objdump a little help in how to present the information.
The full output dump is as follows:
Instructions located from 0
through to 7
correspond directly to the assembly source code that we’ve written. After this point, the file is storing our string that we’re going to print which is why the assembly code looks a little chaotic.
Removing the jibberish assembly language, the bytes directly correspond to our string:
So, our string starts at address 8
but the first line of our assembly code; the line that’s loading dx
with the address of our string msg
has disassembled to this:
The address of $0x108
is going to overshoot the address of our string by 0x100
! This is where the ORG
directive comes in. Because we have specified this, all of our addresses are adjusted to suit. When DOS loads our COM file, it’ll be in at 0x100
and our addresses will line up perfectly.