Cogs and Levers A blog full of technical stuff

Actor Pattern in Rust

Introduction

Concurrency is a cornerstone of modern software development, and the actor pattern is a well-established model for handling concurrent computations. Rust, with its focus on safety, performance, and concurrency, provides an excellent platform for implementing the actor model. In this article, we’ll explore what the actor pattern is, how it works in Rust, and dive into some popular libraries that implement it.

What is the Actor Pattern?

The actor pattern revolves around the concept of “actors,” which are independent, lightweight entities that communicate exclusively through message passing. Each actor encapsulates state and behavior, processing messages asynchronously and maintaining its own isolated state. This model eliminates the need for shared state, reducing the complexity and risks associated with multithreaded programming.

Why Use the Actor Pattern?

  • Isolation: Each actor manages its own state, ensuring safety.
  • Message Passing: Communication happens via asynchronous messages, avoiding direct interactions or locks.
  • Fault Tolerance : Actor hierarchies can implement supervision strategies, enabling automatic recovery from failures.

Libraries

As a basic example for comparison, we’ll create an actor that handles one message “Ping”.

Actix

Actix is the most popular and mature actor framework in Rust. Built on top of tokio, it offers high-performance async I/O along with a robust actor-based architecture.

Features:

  • Lightweight actors with asynchronous message handling.
  • Built-in supervision for error recovery.
  • Excellent integration with web development (actix-web).

Example:

Here’s how to create a simple actor that responds to messages with Actix:

actix = "0.13.5"
use actix::prelude::*;

struct MyActor;

impl Actor for MyActor {
    type Context = Context<Self>;
}

struct Ping;

impl Message for Ping {
    type Result = String; 
}

impl Handler<Ping> for MyActor {
    type Result = String; 

    fn handle(&mut self, _msg: Ping, _ctx: &mut Context<Self>) -> Self::Result {
        "Pong".to_string()
    }
}

#[actix::main]
async fn main() {
    let addr = MyActor.start();
    let res = addr.send(Ping).await.unwrap();
    println!("Response: {}", res);
}

Breakdown

  • Any rust type can be an actor, it only needs to implement the Actor trait
    • We’ve defined MyActor for this
  • To be able to handle a specific message the actor has to provide a Handler<M> implementation
    • The Ping message is defined and handled by MyActor’s handle function
  • The actor is now started
  • A Ping message is sent, and the response is waited on

Riker

Inspired by Akka (Scala’s popular actor framework), Riker is another actor-based framework in Rust. While less active than Actix, Riker focuses on distributed systems and fault tolerance.

Features:

  • Actor supervision strategies.
  • Distributed messaging.
  • Strong typing for messages.

Example:

This example is taken from the Riker Github repository:

riker = "0.4.2"
use std::time::Duration;
use riker::actors::*;

#[derive(Default)]
struct MyActor;

// implement the Actor trait
impl Actor for MyActor {
    type Msg = String;

    fn recv(&mut self,
            _ctx: &Context<String>,
            msg: String,
            _sender: Sender) {

        if msg == "Ping" {
            println!("Pong!");
        } else {
            println!("Received: {}", msg);
        }

    }
}

// start the system and create an actor
fn main() {
    let sys = ActorSystem::new().unwrap();

    let my_actor = sys.actor_of::<MyActor>("my-actor").unwrap();

    my_actor.tell("Ping".to_string(), None);

    std::thread::sleep(Duration::from_millis(500));
}

Breakdown

  • MyActor is implemented from an Actor trait
  • Messages are handled by the recv function
  • An actor system is started with ActorSystem::new()
  • We need to wait at the end for the message to be processed

Xactor

xactor is a more modern and ergonomic actor framework, simplifying async/await integration compared to Actix. xactor is based on async-std.

Example:

This example was taken from xactor’s Github README.

xactor = "0.7.11"
use xactor::*;

#[message(result = "String")]
struct Ping;

struct MyActor;

impl Actor for MyActor {}

#[async_trait::async_trait]
impl Handler<Ping> for MyActor {
    async fn handle(&mut self, _ctx: &mut Context<Self>, _: Ping) -> String {
        "Pong".to_string()
    }
}

#[xactor::main]
async fn main() -> Result<()> {
    // Start actor and get its address
    let addr = MyActor.start().await?;

    let res = addr.call(Ping).await?;
    println!("{}", res);

    Ok(())
}

Breakdown

  • Defined is a MyActor actor trait, and a Ping message
  • The handle function is implemented for MyActor
  • Using this framework, async and await allows for the result to be waited on

Advantages of the Actor Pattern in Rust

Rust’s concurrency features and the actor model complement each other well:

  • Memory Safety: The actor model eliminates data races, and Rust’s borrow checker enforces safe state access.
  • Scalability: Asynchronous message passing allows scaling systems efficiently.
  • Fault Tolerance: Supervision hierarchies help manage errors and recover gracefully.

When to Use the Actor Pattern

The actor pattern is a good fit for:

  • Distributed Systems: Where isolated units of computation need to communicate across nodes.
  • Concurrent Systems: That require fine-grained message handling without shared state.
  • Web Applications: With complex stateful backends (e.g., using Actix-Web).

Alternatives to the Actor Pattern

While powerful, the actor model isn’t always necessary. Rust offers other concurrency paradigms:

  • Channels: Using std::sync::mpsc or tokio::sync::mpsc for message passing.
  • Shared-State Concurrency: Leveraging Arc<Mutex<T>> to manage shared state.
  • Futures and Tasks: Directly working with Rust’s async ecosystem.

Conclusion

The actor pattern is alive and well in Rust, with libraries like Actix, Riker, and xactor making it accessible to developers. Whether you’re building distributed systems, scalable web applications, or concurrent computation engines, the actor model can simplify your design while leveraging Rust’s safety and performance guarantees.

Building a Daemon using Rust

Introduction

Daemons — long-running background processes — are the backbone of many server applications and system utilities. In this tutorial, we’ll explore how to create a robust daemon using Rust, incorporating advanced concepts like double forking, setsid, signal handling, working directory management, file masks, and standard file descriptor redirection.

If you’re familiar with my earlier posts on building CLI tools and daemon development in C, this article builds on those concepts, showing how Rust can achieve similar low-level control while leveraging its safety and modern tooling.

What Is a Daemon?

A daemon is a background process that runs independently of user interaction. It often starts at system boot and remains running to perform specific tasks, such as handling requests, monitoring resources, or providing services.

Key Features of a Daemon

  1. Independence from a terminal: It should not terminate if the terminal session closes.
  2. Clean shutdown: Handle signals gracefully for resource cleanup.
  3. File handling: Operate with specific file permissions and manage standard descriptors.

Rust, with its safety guarantees and powerful ecosystem, is an excellent choice for implementing these processes.

Setup

First, we’ll need to setup some dependencies.

Add these to your Cargo.toml file:

[dependencies]
log = "0.4"
env_logger = "0.11.5"
nix = { version = "0.29.0", features = ["process", "fs", "signal"] }
signal-hook = "0.3"

Daemonization in Rust

The first step in daemonizing a process is separating it from the terminal and creating a new session. This involves double forking and calling setsid.

use nix::sys::stat::{umask, Mode};
use nix::sys::signal::{signal, SigHandler, Signal};
use std::fs::File;
use std::os::unix::io::AsRawFd;
use std::env;
use nix::unistd::{ForkResult, fork};

pub unsafe fn daemonize() -> Result<(), Box<dyn std::error::Error>> {
    // First fork
    match fork()? {
        ForkResult::Parent { .. } => std::process::exit(0),
        ForkResult::Child => {}
    }

    // Create a new session
    nix::unistd::setsid()?;

    // Ignore SIGHUP
    unsafe {
        signal(Signal::SIGHUP, SigHandler::SigIgn)?;
    }

    // Second fork
    match fork()? {
        ForkResult::Parent { .. } => std::process::exit(0),
        ForkResult::Child => {}
    }

    // Set working directory to root
    env::set_current_dir("/")?;

    // Set file mask
    umask(Mode::empty());

    // Close and reopen standard file descriptors
    close_standard_fds();

    Ok(())
}

fn close_standard_fds() {
    // Close STDIN, STDOUT, STDERR
    for fd in 0..3 {
        nix::unistd::close(fd).ok();
    }

    // Reopen file descriptors to /dev/null
    let dev_null = File::open("/dev/null").unwrap();
    nix::unistd::dup2(dev_null.as_raw_fd(), 0).unwrap(); // STDIN
    nix::unistd::dup2(dev_null.as_raw_fd(), 1).unwrap(); // STDOUT
    nix::unistd::dup2(dev_null.as_raw_fd(), 2).unwrap(); // STDERR
}

Notice the usage of unsafe. Because we are reaching out to some older system calls here, we need to bypass some of the safety that rust provides but putting this code into these unsafe blocks.

Whenever using unsafe in Rust:

  • Justify its Use: Ensure it is necessary, such as for interacting with low-level system calls.
  • Minimize its Scope: Encapsulate unsafe operations in a well-tested function to isolate potential risks.
  • Document Clearly: Explain why unsafe is needed and how the function remains safe in practice.

Handling Signals

Daemons need to handle signals for proper shutdown and cleanup. We’ll use the signal-hook crate for managing signals.

use signal_hook::iterator::Signals;
use std::thread;

pub fn setup_signal_handlers() -> Result<(), Box<dyn std::error::Error>> {
    // Capture termination and interrupt signals
    let mut signals = Signals::new(&[signal_hook::consts::SIGTERM, signal_hook::consts::SIGINT])?;

    thread::spawn(move || {
        for sig in signals.forever() {
            match sig {
                signal_hook::consts::SIGTERM | signal_hook::consts::SIGINT => {
                    log::info!("Received termination signal. Shutting down...");
                    std::process::exit(0);
                }
                _ => {}
            }
        }
    });

    Ok(())
}

Managing the Environment

A daemon should start in a safe, predictable state.

Working Directory

Change the working directory to a known location, typically the root directory (/).

env::set_current_dir("/")?;

File Mask

Set the umask to 0 to ensure the daemon creates files with the desired permissions.

// Set file mask
umask(Mode::empty());

Putting It All Together

Integrate the daemonization process with signal handling and environment setup in main.rs:

mod daemon;
mod signals;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize logging
    env_logger::init();

    log::info!("Starting daemonization process...");

    // Daemonize the process
    unsafe { daemon::daemonize()?; }

    // Set up signal handling
    signals::setup_signal_handlers()?;

    // Main loop
    loop {
        log::info!("Daemon is running...");
        std::thread::sleep(std::time::Duration::from_secs(5));
    }
}

Because we marked the daemonize function as unsafe, we must wrap it in unsafe to use it here.

Advanced Features

Signal Handlers for Additional Signals

Add handlers for non-critical signals like SIGCHLD, SIGTTOU, or SIGTTIN.

use nix::sys::signal::{SigHandler, Signal};

unsafe {
    signal(Signal::SIGCHLD, SigHandler::SigIgn)?;
    signal(Signal::SIGTTOU, SigHandler::SigIgn)?;
    signal(Signal::SIGTTIN, SigHandler::SigIgn)?;
}

Integration with systemd

To run the daemon with systemd, create a service file:

[Unit]
Description=Logger Daemon
After=network.target

[Service]
ExecStart=/path/to/logger_daemon
Restart=always

[Install]
WantedBy=multi-user.target

Conclusion

With the foundational concepts and Rust’s ecosystem, you can build robust daemons that integrate seamlessly with the operating system. The combination of double forking, signal handling, and proper environment management ensures your daemon behaves predictably and safely.

A full example of this project is up on my github.

Building a CLI Tool using Rust

Introduction

Command-Line Interface (CLI) tools are fundamental for developers, system administrators, and power users alike, offering efficient ways to perform tasks, automate processes, and manage systems. Rust is a popular choice for creating CLI tools due to its high performance, reliability, and modern tooling support.

In this tutorial, we’ll walk through building a simple Rust CLI tool that flips the case of a given string—converting uppercase letters to lowercase and vice versa. By exposing this small function through the command line, we’ll cover Rust’s basics for CLI development, including handling arguments, configuration files, error handling, and more.

Overview

Here’s the roadmap for this tutorial:

  1. Setting Up: Create a scalable project directory.
  2. Parsing Command-Line Arguments: Handle CLI inputs using Rust’s std::env::args and the clap crate.
  3. Adding Configuration: Set up external configuration options with serde.
  4. Using Standard Streams: Handle standard input and output for versatile functionality.
  5. Adding Logging: Use logging to monitor and debug the application.
  6. Error Handling: Make errors descriptive and friendly.
  7. Testing: Write unit and integration tests.
  8. Distribution: Build and distribute the CLI tool.

Setting Up

Let’s start by creating a basic Rust project structured to support scalability and best practices.

Creating a Rust Project

Open a terminal and create a new project:

cargo new text_tool
cd text_tool

This initializes a Rust project with a basic src directory containing main.rs. However, rather than placing all our code in main.rs, let’s structure our project with separate modules and a clear src layout.

Directory Structure

To make our project modular and scalable, let’s organize our project directory as follows:

text_tool
├── src
│   ├── main.rs    # main entry point of the program
│   ├── lib.rs     # main library file
│   ├── config.rs  # configuration-related code
│   └── cli.rs     # command-line parsing logic
├── tests
│   └── integration_test.rs # integration tests
└── Cargo.toml
  • main.rs: The primary entry point, managing the CLI tool setup and orchestrating modules.
  • lib.rs: The library file, which makes our code reusable.
  • config.rs, cli.rs: Modules for specific functions—parsing CLI arguments, handling configuration.

This structure keeps our code modular, organized, and easy to test and maintain. Throughout the rest of the tutorial, we’ll add components to each module, implementing new functionality step-by-step.

Parsing the Command Line

Rust’s std::env::args allows us to read command-line arguments directly. However, for robust parsing, validation, and documentation, we’ll use the clap crate, a powerful library for handling CLI arguments.

Using std::env::args

To explore the basics, let’s try out std::env::args by updating main.rs to print any arguments provided by the user:

// main.rs
use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();
    println!("{:?}", args);
}

Running cargo run -- hello world will output the full list of command-line arguments, with the first entry as the binary name itself.

Switching to clap

While std::env::args works, clap makes argument parsing cleaner and adds support for help messages, argument validation, and more.

Add clap to your project by updating Cargo.toml:

[dependencies]
clap = "4.0"

Then, update src/cli.rs to define the CLI arguments and sub-commands:

use clap::{Parser, Subcommand};

#[derive(Parser)]
#[command(name = "Text Tool")]
#[command(about = "A simple CLI tool for text transformations", long_about = None)]
struct Cli {
    #[command(subcommand)]
    command: Commands,
}

#[derive(Subcommand)]
enum Commands {
    Uppercase { input: String, output: Option<String> },
    Lowercase { input: String, output: Option<String> },
    Replace { input: String, from: String, to: String, output: Option<String> },
    Count { input: String },
}

fn main() {
    let args = Cli::parse();
    match &args.command {
        Commands::Uppercase { input, output } => { /* function call */ },
        Commands::Lowercase { input, output } => { /* function call */ },
        Commands::Replace { input, from, to, output } => { /* function call */ },
        Commands::Count { input } => { /* function call */ },
    }
}

In main.rs, configure the clap command and process arguments:

// main.rs
mod cli;

fn main() {
    let matches = cli::build_cli().get_matches();
    let input = matches.value_of("input").unwrap();
    println!("Input: {}", input);
}

Adding Configuration

To add configuration flexibility, we’ll use the serde crate to allow loading options from an external file, letting users configure input and output file paths, for example.

Add serde and serde_json to Cargo.toml:

[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

Define the configuration in src/config.rs:

// src/config.rs
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug)]
pub struct Config {
    pub input_file: Option<String>,
    pub output_file: Option<String>,
}

This function will look for a config.toml file with a structure like:

default_output = "output.txt"

Using Standard Streams

Like any well-behaved unix tool, we look to take advantage of the standard streams like STDIN, STDOUT, and STDERR so that users of our tool can utilise pipes and redirection and compose our tool in among any of the other tools.

In the case of this application, if we don’t receive an input via the command line parameters, the tool will assume that input is being delivered over STDIN:

use std::fs::File;
use std::io::{self, Read};

/// Reads content from a file if a path is provided, otherwise reads from STDIN.
fn read_input(input_path: Option<&str>) -> Result<String, io::Error> {
    let mut content = String::new();

    if let Some(path) = input_path {
        // Read from file
        let mut file = File::open(path)?;
        file.read_to_string(&mut content)?;
    } else {
        // Read from STDIN
        io::stdin().read_to_string(&mut content)?;
    }

    Ok(content)
}

read_input here handling both scenarios for us.

To integrate with STDOUT we simply use our logging facilities.

Adding Logging

Using the log macros, we can send messages out to STDOUT that are classified into different severities. These severities are:

Level Description
trace Use this log level to set tracing in your code
debug Useful for debugging; provides insights into internal states.
info General information about what the tool is doing.
warn Indicates a potential problem that isn’t necessarily critical.
error Logs critical issues that need immediate attention.

These log levels allow developers to adjust the verbosity of the logs based on the environment or specific needs. Here’s how we can add logging to our Rust CLI.

Initialize the logger

At the start of your application, initialize the env_logger. This reads an environment variable (usually RUST_LOG) to set the desired log level.

use log::{info, warn, error, debug, trace};

fn main() {
    env_logger::init();

    info!("Starting the text tool application");

    // Example logging at different levels
    trace!("This is a trace log - very detailed.");
    debug!("This is a debug log - useful for development.");
    warn!("This is a warning - something unexpected happened, but it’s not critical.");
    error!("This is an error - something went wrong!");
}

Setting log levels

With env_logger, you can control the logging level via the RUST_LOG environment variable. This lets users or developers dynamically set the level of detail they want to see without changing the code.

RUST_LOG=info ./text_tool

Using Log Messages in Functions

Add log messages throughout your functions to provide feedback on various stages or states of the process. Here’s how logging can be added to a text transformation function:

pub fn uppercase(input: &str) -> Result<String, std::io::Error> {
    log::debug!("Attempting to read input from '{}'", input);

    let content = std::fs::read_to_string(input)?;
    log::info!("Converting text to uppercase");

    let result = content.to_uppercase();

    log::debug!("Finished transformation to uppercase");
    Ok(result)
}

Environment-Specific Logging

During development, you might want debug or trace logs to understand the application flow. In production, however, you might set the log level to info or warn to avoid verbose output. The env_logger configuration allows for this flexibility without code changes.

Why Logging Matters

Logging gives developers and users insight into the application’s behavior and status, helping identify issues, track performance, and understand what the tool is doing. This flexibility and transparency in logging make for a more robust, user-friendly CLI tool.

Using these logging best practices will make your Rust CLI tool easier to debug, monitor, and maintain, especially as it grows or gets deployed to different environments.

Error Handling

In a CLI tool, it’s crucial to handle errors gracefully and present clear messages to users. Rust’s Result type makes it easy to propagate errors up the call chain, where they can be handled in a central location. We’ll log error messages to help users and developers understand what went wrong.

Define a Custom Error Type

Defining a custom error type allows you to capture specific error cases and add contextual information.

use std::fmt;

#[derive(Debug)]
enum CliError {
    Io(std::io::Error),
    Config(config::ConfigError),
    MissingArgument(String),
}

impl fmt::Display for CliError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            CliError::Io(err) => write!(f, "I/O error: {}", err),
            CliError::Config(err) => write!(f, "Configuration error: {}", err),
            CliError::MissingArgument(arg) => write!(f, "Missing required argument: {}", arg),
        }
    }
}

impl From<std::io::Error> for CliError {
    fn from(err: std::io::Error) -> CliError {
        CliError::Io(err)
    }
}

Returning Errors from Functions

In each function, use Result<T, CliError> to propagate errors. For example, in a function reading from a file or STDIN, return a Result so errors bubble up:

fn read_input(input_path: Option<&str>) -> Result<String, CliError> {
    let mut content = String::new();
    
    if let Some(path) = input_path {
        let mut file = std::fs::File::open(path)?;
        file.read_to_string(&mut content)?;
    } else {
        std::io::stdin().read_to_string(&mut content)?;
    }

    Ok(content)
}

Logging Errors and Returning ExitCode

In main.rs, handle errors centrally. If an error occurs, log it at an appropriate level and exit with a non-zero status code. For critical issues, use error!, while warn! is suitable for non-fatal issues.

use std::process::ExitCode;
use log::{error, warn};

fn main() -> ExitCode {
    env_logger::init();

    match run() {
        Ok(_) => {
            log::info!("Execution completed successfully");
            ExitCode::SUCCESS
        }
        Err(err) => {
            // Log the error based on its type or severity
            match err {
                CliError::MissingArgument(_) => warn!("{}", err),
                _ => error!("{}", err),
            }

            ExitCode::FAILURE
        }
    }
}

fn run() -> Result<(), CliError> {
    // Your main application logic here
    Ok(())
}

Presenting Error Messages to the User

By logging errors at different levels, users get clear, contextual feedback. Here’s an example scenario where an error is encountered:

fn uppercase(input: Option<&str>) -> Result<String, CliError> {
    let input_path = input.ok_or_else(|| CliError::MissingArgument("input file".to_string()))?;
    let content = read_input(Some(input_path))?;
    Ok(content.to_uppercase())
}

Testing

To ensure our CLI tool functions correctly, we’ll set up both unit tests and integration tests. Unit tests allow us to validate individual transformation functions, while integration tests test the CLI’s behavior from end to end.

Testing Core Functions

In Rust, unit tests typically go in the same file as the function they’re testing. Since our main transformation functions are in src/lib.rs, we’ll add unit tests there.

Here’s an example of how to test the uppercase function:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_uppercase() {
        let input = "hello world";
        let expected = "HELLO WORLD";
        
        let result = uppercase(input).unwrap();
        
        assert_eq!(result, expected);
    }

    #[test]
    fn test_replace() {
        let input = "hello world";
        let from = "world";
        let to = "Rust";
        let expected = "hello Rust";
        
        let result = replace(input, from, to).unwrap();
        
        assert_eq!(result, expected);
    }
}

Each test function:

  • Calls the transformation function with specific input.
  • Asserts that the result matches the expected output, ensuring each function behaves correctly in isolation.

Integration Tests for End-to-End Behavior

Integration tests verify that the CLI as a whole works as expected, handling command-line arguments, file I/O, and expected outputs. These tests go in the tests/ directory, with each test file representing a suite of related tests.

Let’s create an integration test in tests/integration_test.rs:

use assert_cmd::Command;

#[test]
fn test_uppercase_command() {
    let mut cmd = Command::cargo_bin("text_tool").unwrap();
    
    cmd.arg("uppercase").arg("hello.txt").assert().success();
}

#[test]
fn test_replace_command() {
    let mut cmd = Command::cargo_bin("text_tool").unwrap();
    
    cmd.arg("replace").arg("hello.txt").arg("hello").arg("Rust").assert().success();
}

In this example:

  • We use the assert_cmd crate, which makes it easy to test command-line applications by running them as subprocesses.
  • Each test case calls the CLI with arguments to simulate user input and checks that the process completes successfully (assert().success()).
  • Additional assertions can check the output to ensure that the CLI’s behavior matches expectations.

Testing for Errors

We should also verify that errors are handled correctly, showing meaningful messages without crashing. This is especially useful for testing scenarios where users might provide invalid inputs or miss required arguments.

Here’s an example of testing an expected error:

#[test]
fn test_missing_argument_error() {
    let mut cmd = Command::cargo_bin("text_tool").unwrap();
    
    cmd.arg("replace").arg("hello.txt")  // Missing "from" and "to" arguments
        .assert()
        .failure()
        .stderr(predicates::str::contains("Missing required argument"));
}

This test:

  • Runs the CLI without the necessary arguments.
  • Asserts that the command fails (.failure()) and that the error message contains a specific string. The predicates crate is handy here for asserting on specific error messages.

Snapshot Testing for Outputs

Snapshot testing is useful for CLI tools that produce consistent, predictable output. A snapshot test compares the tool’s output to a saved “snapshot” and fails if the output changes unexpectedly.

Using the insta crate for snapshot testing:

use insta::assert_snapshot;

#[test]
fn test_uppercase_output() {
    let mut cmd = Command::cargo_bin("text_tool").unwrap();
    let output = cmd.arg("uppercase").arg("hello.txt").output().unwrap();

    assert_snapshot!(String::from_utf8_lossy(&output.stdout));
}

This test:

  • Runs the uppercase command and captures its output.
  • Compares the output to a stored snapshot, failing if they don’t match. This approach is excellent for catching unexpected changes in output format.

Running Tests

To run all tests (both unit and integration), use:

cargo test

If you’re using assert_cmd or insta, add them as development dependencies in Cargo.toml:

[dev-dependencies]
assert_cmd = "2.0"
insta = "1.16"
predicates = "2.1"  # For testing error messages

Distribution

Distributing your Rust CLI tool doesn’t need to be complicated. Here’s a simple way to package it so that others can easily download and use it.

Build the Release Binary

First, compile a release version of your application. The release build optimizes for performance, making it faster and smaller.

cargo build --release

This command creates an optimized binary in target/release/. The resulting file (e.g., text_tool on Linux/macOS or text_tool.exe on Windows) is your compiled CLI tool, ready for distribution.

Distribute the Binary Directly

For quick sharing, you can simply share the binary file. Make sure it’s compiled for the target platform (Linux, macOS, or Windows) that your users need.

  1. Zip the Binary: Compress the binary into a .zip or .tar.gz archive so users can download and extract it easily.
   zip text_tool.zip target/release/text_tool
   
  1. Add Instructions: In the same directory as your binary, add a README.md or INSTALL.txt file with basic instructions on how to use and run the tool.

Publishing on GitHub Releases

If you want to make the tool available for a broader audience, consider uploading it to GitHub. Here’s a quick process:

  1. Create a GitHub Release: Go to your GitHub repository and click Releases > Draft a new release.

  2. Upload the Binary: Attach your zipped binary (like text_tool.zip) to the release.

  3. Add a Release Note: Include a description of the release, any new features, and basic installation instructions.

Cross-Platform Binaries (Optional)

To make your tool available on multiple platforms, consider cross-compiling:

  • For Linux:
cargo build --release --target x86_64-unknown-linux-musl
  • For Windows:
cargo build --release --target x86_64-pc-windows-gnu
  • For macOS: Run the default release build on macOS.

Putting it all together

The full code for a text_tool application written in Rust can be found in my Github repository here.

This should take you through most of the concepts here, and also give you a robust start on creating your own CLI apps.

State Machines

Introduction

State machines are essential in software for managing systems with multiple possible states and well-defined transitions. Often used in networking protocols, command processing, or user interfaces, state machines help ensure correct behavior by enforcing rules on how a program can transition from one state to another based on specific inputs or events.

In Rust, enums and pattern matching make it straightforward to create robust state machines. Rust’s type system enforces that only valid transitions happen, reducing errors that can arise in more loosely typed languages. In this article, we’ll explore how to design a state machine in Rust that’s both expressive and type-safe, with a concrete example of a networking protocol.

Setting Up the State Machine in Rust

The first step is to define the various states. Using Rust’s enum, we can represent each possible state within our state machine. For this example, let’s imagine we’re modeling a simple connection lifecycle for a network protocol.

Here’s our ConnectionState enum:

enum ConnectionState {
    Disconnected,
    Connecting,
    Connected,
    Error,
}

Each variant represents a specific state that our connection could be in. In a real-world application, you could add more states or include additional information within each state, but for simplicity, we’ll focus on these four.

Defining Transitions

Next, let’s define a transition function. This function will dictate the rules for how each state can move to another based on events. We’ll introduce another enum, Event, to represent the various triggers that cause state transitions:

enum Event {
    StartConnection,
    ConnectionSuccessful,
    ConnectionFailed,
    Disconnect,
}

Our transition function will take in the current state and an event, then use pattern matching to determine the next state.

impl ConnectionState {
    fn transition(self, event: Event) -> ConnectionState {
        match (self, event) {
            (ConnectionState::Disconnected, Event::StartConnection) => ConnectionState::Connecting,
            (ConnectionState::Connecting, Event::ConnectionSuccessful) => ConnectionState::Connected,
            (ConnectionState::Connecting, Event::ConnectionFailed) => ConnectionState::Error,
            (ConnectionState::Connected, Event::Disconnect) => ConnectionState::Disconnected,
            (ConnectionState::Error, Event::Disconnect) => ConnectionState::Disconnected,
            // No transition possible, remain in the current state
            (state, _) => state,
        }
    }
}

This function defines the valid state transitions:

  • If we’re Disconnected and receive a StartConnection event, we transition to Connecting.
  • If we’re Connecting and successfully connect, we move to Connected.
  • If a connection attempt fails, we transition to Error.
  • If we’re Connected or in an Error state and receive a Disconnect event, we return to Disconnected.

Any invalid state-event pair defaults to remaining in the current state.

Implementing Transitions and Handling Events

To make the state machine operate, let’s add a Connection struct that holds the current state and handles the transitions based on incoming events.

struct Connection {
    state: ConnectionState,
}

impl Connection {
    fn new() -> Self {
        Connection {
            state: ConnectionState::Disconnected,
        }
    }

    fn handle_event(&mut self, event: Event) {
        self.state = self.state.transition(event);
    }
}

Now, we can initialize a connection and handle events:

fn main() {
    let mut connection = Connection::new();

    connection.handle_event(Event::StartConnection);
    println!("Current state: {:?}", connection.state); // Should be Connecting

    connection.handle_event(Event::ConnectionSuccessful);
    println!("Current state: {:?}", connection.state); // Should be Connected

    connection.handle_event(Event::Disconnect);
    println!("Current state: {:?}", connection.state); // Should be Disconnected
}

With this setup, we have a fully functional state machine that moves through a predictable set of states based on events. Rust’s pattern matching and type-checking ensure that only valid transitions are possible.

Other Usage

While our connection example is simple, state machines are invaluable for more complex flows, like command processing in a CLI or a network protocol. Imagine a scenario where we have commands that can only run under certain conditions.

Let’s say we have a simple command processing machine that recognizes two commands: Init and Process. The machine can only start processing after initialization. Here’s what the implementation might look like:

enum CommandState {
    Idle,
    Initialized,
    Processing,
}

enum CommandEvent {
    Initialize,
    StartProcessing,
    FinishProcessing,
}

impl CommandState {
    fn transition(self, event: CommandEvent) -> CommandState {
        match (self, event) {
            (CommandState::Idle, CommandEvent::Initialize) => CommandState::Initialized,
            (CommandState::Initialized, CommandEvent::StartProcessing) => CommandState::Processing,
            (CommandState::Processing, CommandEvent::FinishProcessing) => CommandState::Initialized,
            (state, _) => state, // Remain in the current state if transition is invalid
        }
    }
}

With the same transition approach, we could build an interface to handle user commands, enforcing the correct order for initializing and processing. This could be extended to handle error states or additional command flows as needed.

Advantages of Using Rust for State Machines

Rust’s enums and pattern matching provide an efficient, type-safe way to create state machines. The Rust compiler helps prevent invalid transitions, as each match pattern must account for all possible states and events. Additionally:

  • Ownership and Lifetimes: Rust’s strict ownership model ensures that state transitions do not create unexpected side effects.
  • Pattern Matching: Pattern matching allows concise and readable code, making state transitions easy to follow.
  • Enums with Data: Rust enums can hold additional data for each state, providing more flexibility in complex state machines.

Rust’s approach to handling state machines is both expressive and ensures that your code remains safe and predictable. This makes Rust particularly suited for applications that require strict state management, such as networking or command-processing applications.

Conclusion

State machines are a powerful tool for managing structured transitions between states. Rust’s enums and pattern matching make implementing these machines straightforward, with added safety and performance benefits. By taking advantage of Rust’s type system, we can create state machines that are both readable and resistant to invalid transitions.

Reader Writer Locking

Introduction

The Reader-Writer problem is a classic synchronization problem that explores how multiple threads access shared resources when some only need to read the data, while others need to write (or modify) it.

In this problem:

  • Readers can access the resource simultaneously, as they only need to view the data.
  • Writers require exclusive access because they modify the data, and having multiple writers or a writer and a reader simultaneously could lead to data inconsistencies.

In Rust, this problem is a great way to explore RwLock (read-write lock), which allows us to grant multiple readers access to the data but restricts it to a single writer at a time.

Implementing

Here’s a step-by-step guide to implementing a simple version of this problem in Rust.

  1. Set up a shared resource: We’ll use an integer counter that both readers and writers will access.
  2. Create multiple readers and writers: Readers will print the current value, while writers will increment the value.
  3. Synchronize access: Using RwLock, we’ll ensure readers can access the counter simultaneously but block writers when they’re active.

Setting Up Shared State

To manage shared access to the counter, we use Arc<RwLock<T>>. Arc allows multiple threads to own the same data, and RwLock ensures that we can have either multiple readers or a single writer at any time.

Here’s the initial setup:

use std::sync::{Arc, RwLock};
use std::thread;
use std::time::Duration;

fn main() {
    // Shared counter, initially 0, wrapped in RwLock and Arc for thread-safe access
    let counter = Arc::new(RwLock::new(0));

    // Vector to hold all reader and writer threads
    let mut handles = vec![];

Creating Reader Threads

Readers will read the counter’s value and print it. Since they only need to view the data, they’ll acquire a read lock on the RwLock.

Here’s how a reader thread might look:

    // create 5 reader threads
    for i in 0..5 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            loop {
                // acquire a read lock
                let read_lock = counter.read().unwrap();
                
                println!("Reader {} sees counter: {}", i, *read_lock);
                
                // simulate work
                thread::sleep(Duration::from_millis(100)); 
            }
        });
        handles.push(handle);
    }

Each reader:

  • Clones the Arc so it has its own reference to the shared counter.
  • Acquires a read lock with counter.read(), which allows multiple readers to access it simultaneously.
  • Prints the counter value and then waits briefly, simulating reading work.

Creating Writer Threads

Writers need exclusive access, as they modify the data. Only one writer can have a write lock on the RwLock at a time.

Here’s how we set up a writer thread:

    // create 2 writer threads
    for i in 0..2 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            loop {
                // acquire a write lock
                let mut write_lock = counter.write().unwrap();
                *write_lock += 1;
                println!("Writer {} increments counter to: {}", i, *write_lock);
                thread::sleep(Duration::from_millis(150)); // Simulate work
            }
        });
        handles.push(handle);
    }

Each writer:

  • Clones the Arc to access the shared counter.
  • Acquires a write lock with counter.write(). When a writer holds this lock, no other readers or writers can access the data.
  • Increments the counter and waits, simulating writing work.

Joining the Threads

Finally, we join the threads so the main program waits for all threads to finish. Since our loops are infinite for demonstration purposes, you might add a termination condition or handle to stop the threads gracefully.

    // wait for all threads to finish (they won't in this infinite example)
    for handle in handles {
        handle.join().unwrap();
    }
}

Complete Code

Here’s the complete code breakdown for this problem:

use std::sync::{Arc, RwLock};
use std::thread;
use std::time::Duration;

fn main() {
    let counter = Arc::new(RwLock::new(0));
    let mut handles = vec![];

    // Create reader threads
    for i in 0..5 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            loop {
                let read_lock = counter.read().unwrap();
                println!("Reader {} sees counter: {}", i, *read_lock);
                thread::sleep(Duration::from_millis(100));
            }
        });
        handles.push(handle);
    }

    // Create writer threads
    for i in 0..2 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            loop {
                let mut write_lock = counter.write().unwrap();
                *write_lock += 1;
                println!("Writer {} increments counter to: {}", i, *write_lock);
                thread::sleep(Duration::from_millis(150));
            }
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }
}

Key Components

  • Arc<RwLock<T>>: Arc provides shared ownership, and RwLock provides a mechanism for either multiple readers or a single writer.
  • counter.read() and counter.write(): RwLock’s .read() grants a shared read lock, and .write() grants an exclusive write lock. While the write lock is held, no other threads can acquire a read or write lock.
  • Concurrency Pattern: This setup ensures that multiple readers can operate simultaneously without blocking each other. However, when a writer needs access, it waits until all readers finish, and once it starts, it blocks other readers and writers.

Conclusion

The Reader-Writer problem is an excellent way to understand Rust’s concurrency features, especially RwLock. By structuring access in this way, we allow multiple readers or a single writer, which models real-world scenarios like database systems where reads are frequent but writes require careful, exclusive access.