Tagged Unions

27 May 2025

Introduction

In a previous post, we built a dynamic variant type in C that could safely hold integers, floats, strings, and arrays — all while tracking its current type.

That implementation used a simple, effective trick from the C toolbox: the tagged union.

But what is a tagged union, really? What does it look like in memory? How does it behave across different languages like Rust or Swift? And how can understanding this help you write better low-level code?

In this article, we’ll go deep into the mechanics of tagged unions — demystifying how they work, how they’re laid out in memory, and how other languages leverage this idea at a higher level.

What is a Tagged Union?

A tagged union is a structure that lets you store different types of data in the same memory space — but only one of them at a time.

It consists of two parts:

A tag (also called a discriminator or type id) — this tells you what kind of value the union currently holds.
A union — this is a single memory region that can represent multiple types, but only one is valid at any given time.

Together, they form a tagged union, also known as:

Variant type (C++)
Discriminated union (TypeScript)
Sum type (Haskell, Rust)
enum with payloads (Rust, Swift, F#)

A Simple C Implementation

Let’s take a minimal version of the ced_var_t type we saw earlier:

typedef enum {
    TYPE_INT32,
    TYPE_FLOAT,
    TYPE_STRING,
} value_type_t;

typedef struct {
    value_type_t type;

    union {
        int32_t   i32;
        float     f;
        char*     str;
    } data;
} tagged_value_t;

Here:

type is the tag.
data is the payload.

When you store a value, you must set both correctly:

tagged_value_t v;
v.type = TYPE_INT32;
v.data.i32 = 42;

When you want to read the value, you must check the tag first.

What Does This Look Like in Memory?

Let’s visualize what’s in memory for the above:

+--------+---------+------------------+
| tag    | padding | data             |
| (4B)   | (4B)    | (8B - union)     |
+--------+---------+------------------+

Because str is a char*, which is typically 8 bytes on 64-bit systems, the entire union needs to reserve 8 bytes. So the full struct is:

4 bytes for type (as “tag”)
4 bytes padding (for alignment)
8 bytes for union

Total: 16 bytes

Even if you’re only storing a small int32_t, the full memory block remains the same.

Keep in mind! This is a classic example of how low-level memory alignment and padding rules affect real-world struct layout — and why tools like sizeof() sometimes surprise you.

Why Is This Useful?

Tagged unions give you flexibility:

Store different types dynamically
Inspect and branch on their type at runtime
Compact representation (compared to void* + metadata hacks)
Zero-cost abstraction in low-level code

They’re great for interpreters, configuration systems, scripting languages, or anything where types can vary dynamically.

But It’s Also Dangerous

C won’t stop you from misusing the union:

v.type = TYPE_FLOAT;
v.data.i32 = 1234;  // Invalid — tag and payload don't match!

Accessing a union member that doesn’t match the tag is undefined behavior. It might work. Or it might crash. Or silently corrupt your data.

That’s why you must manage the tag manually, and treat it as the single source of truth.

How Rust Does It Better

In Rust, you’d write this as:

enum Value {
    Int32(i32),
    Float(f32),
    Str(String),
}

Rust does three major things better:

Enforces valid construction — you can’t create an invalid state.
Requires match exhaustiveness — you have to handle all cases.
Uses niche optimization — it can sometimes omit the tag entirely.

For example, Option<&T> takes no extra space. Rust uses the null pointer as the tag for None.

Comparison with Other Languages

Language	Feature	Example
C	Tagged union (manual)	`struct + union + enum`
Rust	Sum type with safety	`enum Value { Int(i32), ... }`
Swift	`enum` with associated data	`enum Value { case int(Int), ... }`
TypeScript	Discriminated union	`{ kind: "int", value: number }`
Haskell/OCaml	Algebraic data type	`data Value = Int Int \| Float Float`

Performance Considerations

Tagged unions are generally compact, but there are trade-offs:

The size of the union is dictated by the largest member.
On 64-bit platforms, alignment often causes padding (e.g., 4-byte tag + 4-byte pad + 8-byte payload).
If a variant holds something heap-allocated (like strings), you may incur pointer-chasing and memory fragmentation.

You can use tools like sizeof(), offsetof(), or pahole to understand the exact layout.

Debugging Tips

If you’re building or debugging a tagged union system in C:

Use assert() to guard tag-to-union access:

  assert(v->type == TYPE_INT32);

Use valgrind to catch misuses or memory leaks.
Inspect raw memory using gdb, hexdump, or print helpers.
Consider printing the tag as a string to make logs easier to read.

Cogs and Levers A blog full of technical stuff

Tagged Unions

Introduction

What is a Tagged Union?

A Simple C Implementation

What Does This Look Like in Memory?

Why Is This Useful?

But It’s Also Dangerous

How Rust Does It Better

Comparison with Other Languages

Performance Considerations

Debugging Tips

Further Reading

Cogs and Levers A blog full of technical stuff

Tagged Unions

Introduction

What is a Tagged Union?

A Simple C Implementation

What Does This Look Like in Memory?

Why Is This Useful?

But It’s Also Dangerous

How Rust Does It Better

Comparison with Other Languages

Performance Considerations

Debugging Tips

Further Reading

Related Posts

Traits vs Typeclasses - A Deep Comparison 28 Jun 2025

Algebraic Effects in Modern Languages 27 Jun 2025

Getting NGINX to Do Things 26 Jun 2025