Cogs and Levers A blog full of technical stuff

Learning Rust Part 10 - Testing and Debugging

Introduction

Rust’s testing and debugging tools make it simple to verify code behavior, measure performance, and catch errors early. The cargo test command provides a built-in testing framework, while println! and dbg! help with debugging during development. This post explores Rust’s testing capabilities, including unit and integration tests, benchmarks, assertions, and effective debugging techniques.

Unit Tests

Unit tests focus on verifying individual functions or components in isolation, ensuring each part of a program functions as expected. In Rust, unit tests are written inline in the same module as the code they test, using the #[test] attribute.

Writing Unit Tests

To define a unit test, apply the #[test] attribute to a function. Rust’s built-in macros assert!, assert_eq!, and assert_ne! allow for assertions to confirm that the test’s outcome is correct.

fn add(a: i32, b: i32) -> i32 {
    a + b
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add() {
        assert_eq!(add(2, 3), 5);
    }
}

Running Tests

Run unit tests with cargo test. All functions marked with #[test] will execute, and results are displayed in the terminal.

cargo test

Integration Tests

Integration tests verify the interactions between multiple modules, ensuring that different components of a codebase work as intended. Rust’s integration tests are placed in a tests directory at the project’s root.

Creating an Integration Test

Each file in the tests directory acts as a separate integration test. These tests can access any public functions or types within the library crate.

// tests/integration_test.rs

use my_project::add;

#[test]
fn test_add() {
    assert_eq!(add(5, 5), 10);
}

Benchmarks and Performance Testing

Rust’s test crate includes benchmarking features for performance testing, which can be run with cargo bench using Rust’s nightly version.

cargo +nightly bench

Writing a Benchmark

The test crate allows you to measure the execution time of specific functions, helping identify areas for optimization.

#![feature(test)]
extern crate test;

use test::Bencher;

fn add(a: i32, b: i32) -> i32 {
    a + b
}

#[bench]
fn bench_add(b: &mut Bencher) {
    b.iter(|| add(10, 20));
}

Run benchmarks using the nightly compiler with cargo +nightly bench, which provides performance insights into each function marked with #[bench].

cargo +nightly bench

Assertions and Custom Testing Utilities

Assertions are key to ensuring code correctness, verifying that conditions hold true. Rust provides several built-in macros for making assertions:

  • assert!: Checks if a condition is true.
  • assert_eq! and assert_ne!: Verify equality and inequality, displaying both values if the assertion fails.
  • assert!(condition, "message"): Adds a custom message if the assertion fails.
#[test]
fn test_condition() {
    let value = 5;
    assert!(value > 2, "Value should be greater than 2");
    assert_eq!(value, 5, "Value should be equal to 5");
}

Custom Assertion Functions

Custom assertion functions can make tests more readable and reusable.

fn is_even(n: i32) -> bool {
    n % 2 == 0
}

#[cfg(test)]
mod tests {
    use super::*;

    fn assert_even(n: i32) {
        assert!(is_even(n), "{} is not even", n);
    }

    #[test]
    fn test_assert_even() {
        assert_even(4);
    }
}

Documentation Testing

Rust’s documentation testing verifies examples in documentation comments to ensure they stay accurate as the code evolves. These tests are written in doc comments (///) and are run with cargo test.

Writing Documentation Tests

Code examples can be embedded within doc comments using triple backticks (```). Rust will automatically test these examples.

/// Adds two numbers together.
///
/// # Examples
///
/// ```
/// let result = my_crate::add(2, 3);
/// assert_eq!(result, 5);
/// ```
pub fn add(a: i32, b: i32) -> i32 {
    a + b
}

Documentation tests provide helpful examples for users and ensure the code continues to function as expected.

Debugging with println! and dbg!

Rust’s println! macro is widely used to inspect values or track program flow during development. The dbg! macro, however, offers more context, displaying both the expression and its result along with file and line information.

Using println! for Debugging

The println! macro outputs information to the console, allowing you to monitor variables and messages.

fn calculate(a: i32) -> i32 {
    println!("Calculating for a = {}", a);
    a * 2
}

Using dbg! for Enhanced Debugging

The dbg! macro shows both the expression and its result, making it useful for evaluating complex expressions. dbg! outputs to standard error, keeping it separate from normal program output.

fn main() {
    let value = 10;
    let doubled = dbg!(value * 2); // Prints: [src/main.rs:4] value * 2 = 20
}

Summary

Rust’s testing and debugging tools simplify the process of validating and refining code, from unit and integration tests to benchmarks and custom assertions. With println! and dbg! macros for debugging, plus documentation testing to keep examples up-to-date, Rust equips developers with the tools needed to build reliable, high-performance applications.

Learning Rust Part 3 - Error Handling

Introduction

Rust’s error handling model focuses on safety and reliability, providing structured patterns that allow developers to manage recoverable and unrecoverable errors without exceptions. This post explains Rust’s key error-handling tools, including Result and Option types, the ? operator, and custom error types.

Result and Option Types

In Rust, the Result and Option types help manage possible errors at compile time, providing clear patterns for handling expected and unexpected outcomes.

  • Result<T, E>: Used for functions that may succeed or fail. The Result type holds two variants: Ok(T) for success and Err(E) for error.
    fn divide(a: f64, b: f64) -> Result<f64, String> {
        if b == 0.0 {
            Err(String::from("Cannot divide by zero"))
        } else {
            Ok(a / b)
        }
    }
    
  • Option<T>: Indicates the possibility of a missing value. It has two variants: Some(T) for a value and None for absence.
    fn get_item(index: usize) -> Option<&'static str> {
        let items = vec!["apple", "banana", "cherry"];
        items.get(index)
    }
    

Unwrapping and Safe Patterns

While unwrap can retrieve a value from Result or Option, it will panic if the value is Err or None. Safer handling patterns are preferred for production code to avoid panics.

Using match with Result

Using match allows us to handle both the success and error cases.

match divide(10.0, 0.0) {
    Ok(value) => println!("Result: {}", value),
    Err(e) => println!("Error: {}", e),
}

Using if let with Option

With if let, we can easily check for the presence of a value.

if let Some(item) = get_item(1) {
    println!("Found item: {}", item);
} else {
    println!("Item not found");
}

Providing Default Values with unwrap_or

The unwrap_or and unwrap_or_else methods allow a fallback value for Err or None.

let value = divide(10.0, 0.0).unwrap_or(0.0);

Error Propagation with the ? Operator

Rust’s ? operator simplifies error propagation in functions that return Result or Option. If an error occurs, ? will return it immediately to the caller, enabling cleaner code with fewer explicit match or unwrap blocks.

fn calculate(a: f64, b: f64) -> Result<f64, String> {
    let result = divide(a, b)?; // Error is propagated if `divide` returns `Err`
    Ok(result + 10.0)
}

Rules for Using ?

The ? operator is only available in functions that return Result or Option. If an error occurs, it will be converted into the return type of the function, allowing for elegant chaining of potentially failing operations.

Panic and Recoverable Errors

Rust differentiates between recoverable errors (handled with Result or Option) and unrecoverable errors (handled with panic!). While panic! stops execution in the case of a critical error, Rust recommends using it sparingly.

Using panic! Wisely

The panic! macro is best reserved for unrecoverable errors that require the program to halt, whereas most errors should be handled with Result or Option.

fn risky_function() {
    panic!("An unrecoverable error occurred");
}

Custom Error Types

For complex applications, custom error types allow fine-grained error handling and more expressive error messages. Custom error types in Rust are usually implemented with the std::fmt::Display and std::error::Error traits.

Defining a Custom Error Type

Creating a custom error type can help differentiate between various error scenarios in a Rust application.

use std::fmt;

#[derive(Debug)]
enum MathError {
    DivisionByZero,
    NegativeRoot,
}

impl fmt::Display for MathError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            MathError::DivisionByZero => write!(f, "Cannot divide by zero"),
            MathError::NegativeRoot => write!(f, "Cannot compute the square root of a negative number"),
        }
    }
}

fn divide(a: f64, b: f64) -> Result<f64, MathError> {
    if b == 0.0 {
        Err(MathError::DivisionByZero)
    } else {
        Ok(a / b)
    }
}

Custom error types support the ? operator, allowing more readable and maintainable error handling across complex codebases.

Logging and Debugging Techniques

Logging is crucial for tracking and debugging errors. Rust provides multiple logging options:

println! for Basic Logging

Simple logging with println! is useful for quick debugging.

println!("This is a basic log message");

Using the log Crate

For more structured logging, the log crate provides multi-level logging capabilities and works with backends like env_logger or log4rs.

use log::{info, warn, error};

fn main() {
    info!("Starting application");
    warn!("This is a warning");
    error!("This is an error message");
}

Debugging with dbg!

The dbg! macro prints a debug message with the file and line number, ideal for inspecting variable values.

let x = 5;
dbg!(x * 2); // Outputs: [src/main.rs:4] x * 2 = 10

Additional Debugging Tools

  • Compiler Error Messages: Rust’s detailed compiler errors help identify issues early.
  • Cargo Check: Quickly identifies syntax errors without a full compile using cargo check.
  • Cargo Test: Run cargo test to validate the application and capture edge cases.

Conclusion

Rust’s error handling model promotes safe, reliable code by providing structured tools like Result, Option, and the ? operator for managing recoverable and unrecoverable errors. With custom error types and logging options, Rust empowers developers to write robust, maintainable applications. By enforcing careful error handling, Rust encourages a proactive approach to managing failures, making it ideal for building reliable systems.

Learning Rust Part 2 - Memory Safety

Introduction

Rust’s approach to memory safety is one of the language’s core features, allowing developers to write efficient, low-level code without a garbage collector. This post will dive into Rust’s memory safety model, explaining the ownership system, borrowing and lifetimes, garbage collection alternatives, and how Rust leverages RAII (Resource Acquisition Is Initialization) to ensure safe memory handling.

Ownership Model

Rust’s ownership model is central to its memory safety guarantees. In Rust, every value has a unique owner, and when this owner goes out of scope, Rust automatically cleans up the associated memory. This system avoids many common bugs found in other languages, such as use-after-free and double-free errors.

Key Rules of Ownership

  • Ownership: Each value in Rust has a unique owner.
  • Move Semantics: When an owner variable is assigned to another variable, the original owner loses access.

Here’s a basic example that shows ownership transfer in Rust:

fn main() {
    let s1 = String::from("hello");
    let s2 = s1; // `s1` is moved to `s2`, `s1` is now invalid
    println!("{}", s2); // Valid
    // println!("{}", s1); // Error: `s1` is invalidated
}

By enforcing ownership rules, Rust guarantees memory safety without the need for a garbage collector.

References and Borrowing Rules

Rust’s borrowing system works alongside ownership, allowing references to data so that functions can access values without taking ownership. Rust enforces strict rules to prevent data races, ensuring safe concurrent programming.

Borrowing Rules

  • Borrowing: Allows functions to temporarily access data without taking ownership.
  • Immutable Borrowing: & allows read-only access, multiple are allowed simultaneously.
  • Mutable Borrowing: &mut allows read-and-write access, but only one mutable reference can exist at a time.
  • References: must always be valid.

Here’s how Rust handles immutable and mutable references:

fn main() {
    let mut data = String::from("hello");

    // Immutable borrow
    let r1 = &data;
    let r2 = &data;

    println!("r1: {}, r2: {}", r1, r2);

    // Mutable borrow
    let r3 = &mut data;
    r3.push_str(", world!");
    println!("r3: {}", r3);
}

The borrowing rules prevent data races by allowing either multiple immutable references or a single mutable reference, but never both simultaneously.

Lifetimes and Scope

To further promote memory safety, Rust uses lifetimes to ensure that references do not outlive the data they point to, avoiding dangling references.

Lifetime Annotations

Rust infers lifetimes in many cases, but explicit lifetime annotations are sometimes necessary, particularly in functions. Here’s an example:

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() { x } else { y }
}

The 'a annotation ensures that the returned reference will live as long as the input references, guaranteeing that the reference is valid.

Lifetimes in Structs

Lifetimes are also useful in structs, helping ensure that struct members don’t outlive the data they refer to.

struct Important<'a> {
    text: &'a str,
}

fn main() {
    let message = String::from("Hello, world!");
    let important = Important { text: &message };
}

Garbage Collection Alternatives

Rust’s ownership and borrowing rules function as a compile-time garbage collector, eliminating the need for runtime garbage collection. This model provides several benefits:

  • Predictable Performance: No garbage collection pauses.
  • Lower Memory Overhead: Efficient stack and heap memory usage.
  • Reduced Runtime Errors: Compile-time checks prevent many common runtime crashes.

Memory Leaks and Handling

While Rust’s ownership and borrowing rules prevent most memory leaks, they can still occur in cases involving reference cycles in data structures. For example, using Rc (Reference Counted) pointers can lead to memory leaks if cycles are not broken with Weak references.

Using Weak references prevents cyclic dependencies between nodes in complex data structures, such as trees or graphs.

use std::rc::{Rc, Weak};

struct Node {
    parent: Option<Weak<Node>>,
    children: Vec<Rc<Node>>,
}

In this example, Weak references are used to ensure that the parent node doesn’t keep a strong reference to its children, breaking any potential reference cycle.

Drop Trait and RAII (Resource Acquisition Is Initialization)

Rust follows the RAII principle, where resources are automatically released when they go out of scope. The Drop trait allows developers to define custom clean-up behavior for resources, ensuring they’re properly managed.

Implementing Drop

The Drop trait provides a drop method that runs automatically when an object is no longer needed.

struct Resource {
    name: String,
}

impl Drop for Resource {
    fn drop(&mut self) {
        println!("Releasing resource: {}", self.name);
    }
}

fn main() {
    let _res = Resource { name: String::from("file.txt") };
} // `_res` goes out of scope here, calling `drop`

RAII in Rust

With RAII, resources like files and network connections are closed as soon as they’re no longer used. This minimizes the chance of resource leaks, and many standard library types in Rust implement Drop to handle resource deallocation automatically.

Conclusion

Rust’s approach to memory safety stands out for its compile-time checks, which enforce safe memory handling through ownership, borrowing, and lifetimes. By relying on these principles instead of a runtime garbage collector, Rust enables developers to write efficient, high-performance applications with minimal risk of memory-related errors. For developers looking to harness both power and safety, Rust offers a comprehensive memory management model that is well worth the investment.

Learning Rust Part 1 - Language Basics

Introduction

Welcome to our series on the Rust programming language! Rust has been gaining a lot of attention in the programming community thanks to its focus on performance, safety, and concurrency. Originally developed by Mozilla, Rust is designed to eliminate many common programming errors at compile time, particularly around memory safety and data races, making it an appealing choice for systems programming and applications requiring high reliability.

In this series, we’ll start with Rust basics, gradually diving into its unique features and core language concepts. Whether you’re coming from a background in languages like C++, Python, or JavaScript, or completely new to programming, this series will help you build a strong foundation in Rust. We’ll look at its syntax and semantics, explore how ownership works, and understand the lifetimes of data—key concepts that make Rust unique.

This first post will guide you through the language essentials, laying the groundwork for deeper topics in future posts.

We’ll cover the following language basics:

  • Syntax and Semantics We’ll start with an overview of Rust’s syntax and how it differs from other languages. You’ll learn about basic expressions, code structure, and how Rust’s strict compiler enforces code quality.

  • Variables and Mutability Rust’s approach to variables and mutability is unique among languages, emphasizing safety by making all variables immutable by default. We’ll explain why this is and how to work with mutable variables when needed.

  • Data Types Rust is a statically typed language, which means the type of each variable must be known at compile time. We’ll explore Rust’s basic data types and how they’re used in programs.

  • Primitive Types Rust offers a range of primitive types, including integers, floating-point numbers, booleans, and characters. Understanding these types and how to work with them is crucial as you start writing Rust code.

  • Constants and Static Variables Constants and static variables are essential for defining fixed values in Rust. We’ll explain the differences between them, as well as when and why to use each.

  • Control Structures Control structures are the basic building blocks for controlling the flow of execution in your programs. We’ll show you how to use the familiar keywords if, loop, while, and for.

  • Pattern Matching Pattern matching is a powerful feature in Rust, providing expressive syntax for conditional branching. We’ll show you how to use the match statement and other forms of pattern matching effectively.

  • Functions and Closures Finally, we’ll cover functions and closures. Rust’s functions are straightforward, but closures (anonymous functions) bring flexibility to Rust’s syntax, especially for functional programming patterns.

Each section in this post is designed to build on the last, creating a comprehensive introduction to the Rust language’s basics. By the end, you’ll have a solid understanding of Rust’s core language features and a foundation to explore more advanced concepts in subsequent posts.

Syntax and Semantics

Basic Program Structure

Every Rust program begins execution in a function named main. Unlike some languages where a main function is optional, Rust requires fn main() as an entry point.

fn main() {
    println!("Hello, world!");
}

Breaking It Down

  • fn defines a function, followed by the function name main.
  • () indicates that main takes no parameters in this example.
  • Curly braces {} are used to define the function’s scope.
  • println! is a macro that prints text to the console, with a ! indicating it’s a macro rather than a function. Rust macros are powerful, but for now, think of println! as equivalent to print or printf in other languages.

Expressions and Statements

Rust is an expression-based language, which means that many parts of the code return a value. For example, the final line of a block (without a semicolon) can act as a return value:

fn add_one(x: i32) -> i32 {
    x + 1 // No semicolon, so this returns the value of `x + 1`
}
  • Expressions (like x + 1 above) return a value and don’t end in a semicolon.
  • Statements perform actions but don’t return a value, often ending with a semicolon.

Rust’s expression-based nature allows for concise and functional-looking code, as shown below:

let result = if x > 0 { x } else { -x }; // Inline expression in an `if` statement

Enforced Code Quality: Compiler Strictness

Rust’s compiler is notoriously strict, which is a feature, not a bug! This strictness catches common mistakes and enforces safe memory practices. Here’s how it affects code structure and quality:

Unused Variables: The compiler warns about unused variables, nudging you to write clean, intentional code.

let x = 42; // Warning if `x` is unused

You can silence these warnings by prefixing variables with an underscore:

let _x = 42; 

Immutable by Default: Variables are immutable unless explicitly marked with mut, encouraging safer programming patterns.

let mut counter = 0; // `counter` can now be modified
counter += 1;

Type Inference with Explicit Typing Encouragement: Rust’s compiler can infer types, but you can (and sometimes should) specify them for clarity and error prevention.

let count: i32 = 10; // Explicit type annotation for clarity

Error Messages: Rust’s Friendly Compiler

Rust’s compiler is known for its friendly and informative error messages. When your code doesn’t compile, Rust will often give suggestions or hints on how to fix it. For example, a typo in a variable name might prompt an error message with suggestions for the correct spelling.

fn main() {
    let x = 10;
    println!("Value of x: {}", y); 
}

The code above will have the compiler emitting messages like this:

-> src/main.rs:5:32
  |
5 |     println!("Value of x: {}", y); 
  |                                ^ help: a local variable with a similar name exists: `x`

Rust’s insistence on safe code often means dealing with the compiler more than in other languages. However, this leads to fewer runtime errors and safer, more reliable programs.

Comments in Rust

Comments in Rust are straightforward and follow conventions you might know from other languages.

  • Single-line comments use //.
// This is a single-line comment
  • Multi-line comments use /* */.
/* This is a
   multi-line comment */

Rust also has documentation comments that generate HTML documentation for code, using /// before functions or modules.

/// This function adds one to the input
fn add_one(x: i32) -> i32 {
    x + 1
}

Data Types

Rust has a rich type system designed to prevent errors and ensure safety. Every variable in Rust has a type, either assigned explicitly or inferred by the compiler.

Scalar Types

  • Integer Types: i8, i16, i32, i64, i128, isize (signed); u8, u16, u32, u64, u128, usize (unsigned).
let x: i32 = -10; // 32-bit signed integer
let y: u8 = 255;  // 8-bit unsigned integer
  • Floating Point Types: f32 (single-precision), f64 (double-precision).
let a: f64 = 3.1415;
let b: f32 = 2.5;
  • Boolean Type: bool, which has two values, true and false.
let is_active: bool = true;
  • Character Type: char, representing a single Unicode scalar value.
let letter: char = 'A';
let emoji: char = '😊';

Compound Types

  • Tuples: Group multiple values of potentially different types
let person: (&str, i32) = ("Alice", 30);
  • Arrays: Fixed-size lists of values of a single type.
let numbers: [i32; 3] = [1, 2, 3];

Constants and Static Variables

Constants

Constants are immutable values defined with const and are global within the scope they’re declared in. Constants must have explicit types and are evaluated at compile time.

const PI: f64 = 3.14159;

Static Variables

Static variables are similar to constants but have a fixed memory address. They can be mutable (with static mut), though this is unsafe.

static VERSION: &str = "1.0";

Control Structures

Rust has similar control structures to C and C++, but with a few distinct Rust-specific behaviors and syntax nuances. Here’s a quick rundown:

  • if: Works similarly to C/C++ but must have a boolean condition (no implicit integer-to-boolean conversions).
let number = 5;
if number > 0 {
    println!("Positive");
} else if number < 0 {
    println!("Negative");
} else {
    println!("Zero");
}
  • loop: Rust’s equivalent to while(true). It’s an infinite loop but can return values using the break keyword.
let mut count = 0;
let result = loop {
    count += 1;
    if count == 10 {
        break count * 2;
    }
};
println!("Result: {}", result);
  • while: Standard while loop as in C.
let mut x = 0;
while x < 5 {
    println!("x is: {}", x);
    x += 1;
}
  • for: Rust’s for loop is typically used with ranges or iterators (no traditional C-style for loop).
for i in 0..5 {
    println!("i is: {}", i);
}

The 0..5 syntax creates a range from 0 to 4. You can also use 0..=5 for an inclusive range from 0 to 5.

Pattern Matching

Rust’s match statement is a powerful control-flow construct that can deconstruct enums, arrays, and tuples.

Using Match with Integers

let number = 7;

match number {
    1 => println!("One"),
    2 | 3 | 5 | 7 => println!("Prime"),
    _ => println!("Other"),
}

Matching with Enums

Pattern matching is particularly useful with enums, as it enables exhaustive handling of each variant.

enum Color {
    Red,
    Green,
    Blue,
}

fn print_color(color: Color) {
    match color {
        Color::Red => println!("Red"),
        Color::Green => println!("Green"),
        Color::Blue => println!("Blue"),
    }
}

Destructuring in Patterns

Rust allows destructuring in match expressions to work with complex data types.

let pair = (1, 2);

match pair {
    (0, _) => println!("First is zero"),
    (_, 0) => println!("Second is zero"),
    _ => println!("No zeros"),
}

Functions and Closures

Functions and closures are both core components of Rust’s programming model.

Functions

Functions are defined with fn and require explicit types for all parameters. Optionally, a function can return a value.

fn add(x: i32, y: i32) -> i32 {
    x + y
}

Closures

Closures are anonymous functions that can capture their environment, and they are commonly used for iterators and callback functions.

let add = |a: i32, b: i32| a + b;
println!("Result: {}", add(5, 10));

Closures infer parameter and return types, but they can also be explicitly typed if needed.

Summary

Rust’s syntax is familiar yet refined, with an expression-oriented structure that keeps code concise. Rust’s strict compiler catches potential issues early, helping you write robust code from the beginning. With these basics, you’ll be ready to dive deeper into Rust’s core features, like variables, mutability, and ownership.

Simple Hashing Algorithms

Introduction

Hash functions are essential in computer science, widely used in data structures, cryptography, and applications like file integrity checks and digital signatures. This post will explore a few well-known hash functions: djb2, Jenkins, Murmur3, and FNV-1a. We’ll discuss each function’s approach and use cases, then dive into sample C implementations for each.

What is a Hash Function?

In brief, a hash function takes input data (e.g., a string) and returns a fixed-size integer, or “hash,” that represents the original data. Ideal hash functions distribute data uniformly, minimizing collisions (when different inputs generate the same hash value). While cryptographic hash functions like SHA-256 prioritize security, our focus here will be on non-cryptographic hashes for tasks such as data lookup and unique identifier generation.

djb2

The djb2 hash function, designed by Daniel J. Bernstein, is a simple yet effective algorithm for hashing strings. Its operation is lightweight, using bit shifts and additions, making it fast and suitable for many non-cryptographic purposes. The main advantage of djb2 lies in its simplicity, which is also why it is commonly used in hash table implementations.

Code

/**
 * @brief Hashes a string using the djb2 algorithm
 * @param str The string to hash
 * @return The hash of the string
 */
uint64_t ced_hash_djb2(const void *key, size_t len) {
    uint64_t hash = 5381;
    const unsigned char *str = key;

    while (len--) {
        hash = ((hash << 5) + hash) + *str++;
    }

    return hash;
}

Explanation

In djb2, we initialize the hash with 5381 and iterate over each character of the string. The main hashing logic is hash = ((hash << 5) + hash) + *str++, which essentially combines shifts and additions for a computationally light transformation.

Jenkins Hash

The Jenkins hash function, created by Bob Jenkins, is popular for its performance and quality of distribution. Jenkins functions are commonly used for hash tables and are generally effective at handling common hashing requirements without high computational overhead.

Code

/**
 * @brief Hashes a string using the Jenkins algorithm
 * @param key The key to hash
 * @param length The length of the key
 * @return The hash of the string
 */
uint32_t ced_hash_jenkins(const void *key, size_t length) {
    uint32_t hash = 0;
    const uint8_t *data = (const uint8_t *)key;

    for (size_t i = 0; i < length; ++i) {
        hash += data[i];
        hash += (hash << 10);
        hash ^= (hash >> 6);
    }

    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);

    return hash;
}

Explanation

In this implementation, each byte of the input affects the entire hash state via a series of shifts and XORs. These bitwise operations mix the bits thoroughly, helping reduce the chances of hash collisions, especially for small or repetitive data inputs.

Murmur3

Murmur3 is part of a family of hash functions known for their speed and good distribution characteristics. Designed by Austin Appleby, Murmur3 performs exceptionally well on large datasets and is commonly used in database indexing, distributed systems, and other applications where hash quality and performance are paramount.

Code

/**
 * @brief Hashes a string using the Murmur3 algorithm
 * @param key The key to hash
 * @param length The length of the key
 * @param seed The seed value for the hash
 * @return The hash of the string
 */
uint32_t ced_hash_murmur(const void *key, size_t length, uint32_t seed) {
    const uint32_t m = 0x5bd1e995;
    const int r = 24;

    uint32_t hash = seed ^ length;
    const uint8_t *data = (const uint8_t *)key;

    while (length >= 4) {
        uint32_t k = *(uint32_t *)data;

        k *= m;
        k ^= k >> r;
        k *= m;

        hash *= m;
        hash ^= k;

        data += 4;
        length -= 4;
    }

    switch (length) {
        case 3: hash ^= data[2] << 16;
        case 2: hash ^= data[1] << 8;
        case 1: hash ^= data[0];
            hash *= m;
    }

    hash ^= hash >> 13;
    hash *= m;
    hash ^= hash >> 15;

    return hash;
}

Explanation

Murmur3 processes input in 4-byte chunks, applying a seed-based hashing operation with bit shifts to achieve randomness. This function is optimized for speed and provides excellent performance, particularly in scenarios where uniform hash distribution is critical.

FNV-1a

The FNV-1a hash is another widely used, fast, non-cryptographic hash function. It is well-known for its simplicity and reasonable distribution properties. FNV-1a is often used for smaller data structures like hash tables and is compatible with both small and large datasets.

/**
 * @brief Hashes a string using the FNV1a algorithm
 * @param key The key to hash
 * @param length The length of the key
 * @return The hash of the string
 */
uint32_t ced_hash_fnv1a(const void *key, size_t length) {
    const uint32_t offset_basis = 2166136261;
    const uint32_t prime = 16777619;

    uint32_t hash = offset_basis;
    const uint8_t *data = (const uint8_t *)key;

    for (size_t i = 0; i < length; ++i) {
        hash ^= data[i];
        hash *= prime;
    }

    return hash;
}

Explanation

FNV-1a initializes a hash value with an offset basis and iterates over each byte, XORing it with the hash and then multiplying by a prime number. This operation sequence yields a well-distributed hash while maintaining simplicity.

Summary

Each of the four hash functions reviewed here has distinct characteristics:

  • djb2: Simple and efficient, suitable for smaller data and hash tables.
  • Jenkins: Offers good distribution with minimal computation, ideal for hash tables.
  • Murmur3: Fast and optimized for larger data, making it ideal for database indexing and distributed applications.
  • FNV-1a: Simple and widely used, especially in situations where lightweight and straightforward hash computation is required.

Choosing the right hash function depends on the specific requirements of your application, particularly the trade-offs between speed, distribution quality, and memory usage. The implementations shared here provide a starting point for integrating efficient hashing techniques in C.