High Performance Linux IO with IO_URING
23 Dec 2024Introduction
IO_URING is an advanced asynchronous I/O interface introduced in the Linux kernel (version 5.1). It’s designed to provide significant performance improvements for I/O-bound applications, particularly those requiring high throughput and low latency.
It’s well worth taking a look in the linux man pages for io_uring and having a read through the function interface.
In today’s article we’ll discuss IO_URING in depth and follow with some examples to see it in practice.
What is IO_URING
IO_URING is a high-performance asynchronous I/O interface introduced in Linux kernel version 5.1. It was developed
to address the limitations of traditional Linux I/O mechanisms like epoll
, select
, and aio
. These earlier
approaches often suffered from high overhead due to system calls, context switches, or inefficient batching, which
limited their scalability in handling modern high-throughput and low-latency workloads.
At its core, IO_URING provides a ring-buffer-based mechanism for submitting I/O requests and receiving their completions, eliminating many inefficiencies in older methods. This allows applications to perform non-blocking, asynchronous I/O with minimal kernel involvement, making it particularly suited for applications such as databases, web servers, and file systems.
How does IO_URING work?
IO_URING’s architecture revolves around two primary shared memory ring buffers between user space and the kernel:
- Submission Queue (SQ):
- The SQ is a ring buffer where applications enqueue I/O requests.
- User-space applications write requests directly to the buffer without needing to call into the kernel for each operation.
- The requests describe the type of I/O operation to be performed (e.g., read, write, send, receive).
- Completion Queue (CQ):
- The CQ is another ring buffer where the kernel places the results of completed I/O operations.
- Applications read from the CQ to retrieve the status of their submitted requests.
The interaction between user space and the kernel is simplified:
- The user-space application adds entries to the Submission Queue and notifies the kernel when ready (via a single syscall like
io_uring_enter
). - The kernel processes these requests and posts results to the Completion Queue, which the application can read without additional syscalls.
Key Features
- Batching Requests:
- Multiple I/O operations can be submitted in a single system call, significantly reducing syscall overhead.
- Zero-copy I/O:
- Certain operations (like reads and writes) can leverage fixed buffers, avoiding unnecessary data copying between kernel and user space.
- Kernel Offloading:
- The kernel can process requests in the background, allowing the application to continue without waiting.
- Efficient Polling:
- Supports event-driven programming with low-latency polling mechanisms, reducing idle time in high-performance applications.
- Flexibility:
- IO_URING supports a wide range of I/O operations, including file I/O, network I/O, and event notifications.
Code
Let’s get some code examples going to see exactly what we’re dealing with.
First of all, check to see that your kernel supports IO_URING. It should. It’s been available since 51.
You’ll also need liburing
avaliable to you in order to compile these examples.
Library setup
In this first example, we won’t perform any actions; but we’ll setup the library so that we can use these operations. All of our other examples will use this as a base.
We’ll need some basic I/O headers as well as liburing.h
.
We initialize our uring queue using io_uring_queue_init
:
When we’re finished with the ring, we cleanup with io_uring_queue_exit
.
Simple Write
In this example, we’ll queue up a write of a string out to a file and that’s it.
First, we need to open the file like usual:
Now, we setup the write job to happen.
The io_uring_get_sqe
function will get us the next available submission queue entry from the job queue. Once we have
secured one of these, we then fill a vector I/O structure (a iovec
) with the details of our data. Here it’s just the
data pointer, and length.
Finally, we prepare a vector write request using io_uring_prep_writev
.
We submit the job off to be processed now with io_uring_submit
:
We can wait for the execution to complete; even more powerful though is we can be off doing other things if we’d like!
In order to wait for the job to finish, we use io_uring_wait_cqe
:
We check the result of the job through the io_uring_cqe
structure filled by the io_uring_wait_cqe
call:
Finally, we mark the uring event as consumed and close the file.
The full example of this can be found here.
Multiple Operations
We can start to see some of the power of this system in this next example. We’ll submit multiple jobs for processing.
We’ve opened a source file for reading int src_fd
and a destination file for writing in dest_fd
.
So, this is just sequentially executing multiple operations.
The full example of this can be found here.
Asynchronous operations
Finally, we’ll write an example that will process multiple operations in parallel.
The following for loop sets up 3 read jobs:
All of the requests now get submitted for processing:
Finally, we wait on each of the jobs to finish. The important thing to note here, is that we could be busy off doing otherthings rather than just waiting for these jobs to finish.
The entire example of this one can be found here.
Conclusion
IO_URING represents a transformative step in Linux asynchronous I/O, providing unparalleled performance and flexibility for modern applications. By minimizing syscall overhead, enabling zero-copy I/O, and allowing concurrent and batched operations, it has become a vital tool for developers working on high-performance systems.
Through the examples we’ve covered, you can see the practical power of IO_URING, from simple write operations to complex asynchronous processing. Its design not only simplifies high-throughput I/O operations but also opens up opportunities to optimize and innovate in areas like database systems, networking, and file handling.