Hash Function

A hash function takes an input of arbitrary size and produces an output of fixed size, called a hash, hash value, or digest. The same input always yields the same hash, but the output is much smaller than the input, so it acts as a compact fingerprint of the data. Hash functions appear in two broad roles: as the addressing mechanism inside hash tables, and as integrity and security primitives in the form of checksums and cryptographic hashes.

For hash tables, the property that matters is uniformity. The MIT 6.006 hashing lecture stresses that a good hash function should scatter keys across the available buckets as evenly as possible, so that no single bucket collects a disproportionate share of the keys. Uneven hashing degrades a hash table from expected constant time toward linear time, because crowded buckets must be searched element by element.

Cryptographic hash functions add far stronger requirements. NIST’s FIPS 180-4, the Secure Hash Standard, specifies functions such as the SHA family that map a message to a fixed-length digest while being one-way and collision-resistant. In that setting it must be computationally infeasible to recover an input from its hash (preimage resistance) or to find two inputs sharing a hash (collision resistance), which is what lets these functions back digital signatures and data-integrity checks.

The two roles share a core idea but optimize for different things. A hash-table function is chosen to be fast and to spread inputs evenly; a cryptographic function accepts much higher cost in exchange for being hard to reverse or to fool. Both, however, are deterministic maps from a large input space to a small fixed-size output, which is what makes collisions an unavoidable fact of life.

Sources

Related