Simple Hashing Algorithms
26 Oct 2024Introduction
Hash functions are essential in computer science, widely used in data structures, cryptography, and applications like file integrity checks and digital signatures. This post will explore a few well-known hash functions: djb2, Jenkins, Murmur3, and FNV-1a. We’ll discuss each function’s approach and use cases, then dive into sample C implementations for each.
What is a Hash Function?
In brief, a hash function takes input data (e.g., a string) and returns a fixed-size integer, or “hash,” that represents the original data. Ideal hash functions distribute data uniformly, minimizing collisions (when different inputs generate the same hash value). While cryptographic hash functions like SHA-256 prioritize security, our focus here will be on non-cryptographic hashes for tasks such as data lookup and unique identifier generation.
djb2
The djb2 hash function, designed by Daniel J. Bernstein, is a simple yet effective algorithm for hashing strings.
Its operation is lightweight, using bit shifts and additions, making it fast and suitable for many non-cryptographic
purposes. The main advantage of djb2
lies in its simplicity, which is also why it is commonly used in hash table
implementations.
Code
Explanation
In djb2
, we initialize the hash with 5381
and iterate over each character of the string. The main hashing logic is
hash = ((hash << 5) + hash) + *str++
, which essentially combines shifts and additions for a computationally light
transformation.
Jenkins Hash
The Jenkins hash function, created by Bob Jenkins, is popular for its performance and quality of distribution. Jenkins functions are commonly used for hash tables and are generally effective at handling common hashing requirements without high computational overhead.
Code
Explanation
In this implementation, each byte of the input affects the entire hash state via a series of shifts and XORs. These bitwise operations mix the bits thoroughly, helping reduce the chances of hash collisions, especially for small or repetitive data inputs.
Murmur3
Murmur3 is part of a family of hash functions known for their speed and good distribution characteristics. Designed by Austin Appleby, Murmur3 performs exceptionally well on large datasets and is commonly used in database indexing, distributed systems, and other applications where hash quality and performance are paramount.
Code
Explanation
Murmur3 processes input in 4-byte chunks, applying a seed-based hashing operation with bit shifts to achieve randomness. This function is optimized for speed and provides excellent performance, particularly in scenarios where uniform hash distribution is critical.
FNV-1a
The FNV-1a hash is another widely used, fast, non-cryptographic hash function. It is well-known for its simplicity and reasonable distribution properties. FNV-1a is often used for smaller data structures like hash tables and is compatible with both small and large datasets.
Explanation
FNV-1a initializes a hash value with an offset basis and iterates over each byte, XORing it with the hash and then multiplying by a prime number. This operation sequence yields a well-distributed hash while maintaining simplicity.
Summary
Each of the four hash functions reviewed here has distinct characteristics:
- djb2: Simple and efficient, suitable for smaller data and hash tables.
- Jenkins: Offers good distribution with minimal computation, ideal for hash tables.
- Murmur3: Fast and optimized for larger data, making it ideal for database indexing and distributed applications.
- FNV-1a: Simple and widely used, especially in situations where lightweight and straightforward hash computation is required.
Choosing the right hash function depends on the specific requirements of your application, particularly the trade-offs between speed, distribution quality, and memory usage. The implementations shared here provide a starting point for integrating efficient hashing techniques in C.