Cryptographic hash function
A cryptographic hash function is a special type of hash function that takes an arbitrary size block of data (the input) and returns a fixed-size bit string (the hash value, hash code, digest, or simply hash). Unlike general hash functions used for data structures, cryptographic hash functions are specifically designed with properties that make them suitable for use in cryptography and computer security applications.
The primary purpose of a cryptographic hash function is to create a unique, fixed-size "fingerprint" or summary of the input data. Even a tiny change in the input data will result in a significantly different hash value.
Contents
Overview and Purpose
Cryptographic hash functions are one-way functions: it is computationally easy to compute the hash value from the input data, but computationally infeasible to reverse the process – that is, to find the original input data given only the hash value.
They are used extensively to:
Verify Data Integrity: Check if data (like a file or message) has been altered since its hash was originally computed. If the current hash matches the original hash, the data is likely unchanged. Securely Store Passwords: Store hash values of passwords instead of the passwords themselves. As building blocks in more complex cryptographic systems like digital signatures and Blockchain.
Key Properties
For a hash function to be considered cryptographic, it must possess several crucial properties:
- Deterministic
- The same input data will always produce the exact same hash value.
- Quick Computation
- It must be computationally quick and easy to calculate the hash value for any given input data.
- Avalanche Effect
- A small change in the input data (e.g., changing or adding a single bit) should result in a drastically different hash value. Even seemingly similar inputs should produce completely unrelated outputs.
- Collision Resistance (Strong Collision Resistance)
- It must be computationally infeasible to find two different input values that produce the same hash output. Finding a collision should be extremely difficult.
- Pre-image Resistance (One-Way Property)
- Given a hash value, it must be computationally infeasible to determine or reconstruct the original input data that produced that hash value. This is the "one-way" aspect.
- Second Pre-image Resistance (Weak Collision Resistance)
- Given an input value and its hash value, it must be computationally infeasible to find a different input value that produces the same hash output.
A hash function that lacks any of these properties is considered cryptographically broken and unsuitable for security applications.
Distinction from Checksums
While both checksums and cryptographic hash functions produce a fixed-size value from data and are used to check data integrity, their purposes and design goals are different:
Checksums are designed to detect accidental errors or changes in data (e.g., due to transmission errors). They are simpler and faster to compute, but it is relatively easy to intentionally alter data while keeping the checksum the same. Cryptographic Hash Functions are designed to detect intentional tampering with data. Their properties like collision resistance and avalanche effect make it computationally infeasible to alter data without changing the hash value.
Usage Examples
Password Storage: Instead of storing user passwords in plain text (which would be a security risk if the database is breached), systems store the hash of the password. When a user tries to log in, the system hashes the entered password and compares1 it to the stored hash. This way, the original password is never stored. (Hashing is often combined with salting for added security). 1.
Data Integrity: Before distributing a file, a software provider can compute its hash value. Users downloading the file can then compute the hash of their downloaded copy and compare it to the original hash. If they match, the file is identical to the original and hasn't been corrupted or tampered with during download. Digital Signatures: Cryptographic hash functions are a key component of digital signatures. Instead of encrypting the entire document (which is computationally expensive), the sender hashes the document, then encrypts only the small hash value using their private key. The recipient can verify the signature by decrypting the hash using the sender's public key and comparing it to a hash they compute from the received document. Blockchain: Hash functions are used to link blocks of transactions together securely. Each block contains the hash of the previous block, creating a chain. Altering any data in a block would change its hash, breaking the link and making the tampering immediately obvious.
Example for Beginners
Imagine you have a book (this is your input data). You want a way to check later if the book has been changed at all, even by one word, without having to re-read the entire book.
Let's imagine a special, magical machine that acts like a cryptographic hash function:
Input: You put your entire book into the machine. It doesn't matter if it's a short story or a huge novel. Output: The machine processes the book and gives you back a short, unique code – let's say it's always a sequence like ABC-123-XYZ. This is the "fingerprint" or hash of your book. The code is always the same length, no matter how big the book was. Same Book, Same Code: If you put the exact same book into the machine again, it will always give you the exact same code (ABC-123-XYZ). One Tiny Change, Totally Different Code: Now, change just one word in the book (or even a comma). Put the slightly changed book into the machine. The code you get back will be something completely different, like QWE-789-RTY. This is the "avalanche effect" – a small change makes a huge difference in the output. Easy to Create the Code: It's quick and easy to put a book in and get its code. Hard to Go Back (One-Way): If someone gives you only the code ABC-123-XYZ, it's practically impossible to figure out exactly what book (what words, in what order) was put into the machine to get that code. You can't reverse-engineer the book from the fingerprint. Hard to Find Another Book with the Same Code (Collision Resistance): It's incredibly difficult, for all practical purposes, to find a different book that would also produce the code ABC-123-XYZ from this magical machine. Each book has a unique fingerprint. In the digital world, the "book" is your digital data (a file, a message, your password). The "magical machine" is the cryptographic hash function (like SHA-256). The "code" is the hash value.
This allows computers to quickly generate a unique fingerprint for any data. If the fingerprint ever changes, you know the data has been altered. It's a fundamental tool for ensuring security and verifying integrity without needing to process or reveal the original, possibly large or sensitive, data.
See also
External links
NIST: Hash Functions Project Khan Academy: Introduction to hash functions (Beginner explanation) Cloudflare: What is Hashing?