Hashing
Contents
What is hashing
Hashing is the process of calculating a fixed length value from another value, such as a string. A value is passed into a Hash Function and the function returns a Hash Value. This process is used in cryptography. Rather than storing a 'secret' (such as a password) as plain text, it's hash value is calculated and stored in its place. Therefore, if the storage database was compromised and an attacker could see the stored passwords, they wouldn't know their actual value - only the hash. It is intentionally difficult to reverse the hashing process without the original parameters and function used to create it.
Hashing is also used for storing large quantities of data in a hashing table. The data is converted to an integer hash, which is then used to 'search' the table to fetch the relevant data. This has the advantage that finding an integer from a list of integers is much faster than searching for a string within a list of strings.
Key Terms
Hash Value
The Hash value is the value generated after passing the plain text through a hashing function.
Hash Key
The hash key is the specific modifier passed into the hash function in order to generate the returned value.
Hash Function
The hash function is the method in which plain text is converted to a hash value. It contains all the logic to alter the data into something that is more challenging to use.
Collisions
Collisions occur when two values, when entered into the same hashing function, return the same hash value. These are inevitable when dealing with huge quantities of values no matter how complex the function. The creator can take steps to reduce the number of collisions that will occur but they are impossible to prevent.
Handling Collisions
Rehashing / Closed
Linear rehashing would increment the hash value to the next available location. This will obviously require additional logic in your look up routine, because the value returned should be the location the data is stored. You would also need to check the nearest locations to see if the data collided with the current location.
Rehashing could also involve building a new hash table with a new algorithm and rehash all of the values using the new algorithm.
Linked List of Collided Values / Open
Example
PHP's crypt() function is a hashing function that supports multiple methods & standards, designed for password cryptography. They use a salt, which is additional data that is added to the value to be hashed in order to make it longer and more complex.
From PHP.net Manual on crypt:
1 echo 'SHA-512: ' . crypt('rasmuslerdorf', '$6$rounds=5000$usesomesillystringforsalt$') . "<br>";
2 Output: SHA-512: $6$rounds=5000$usesomesillystri$D4IrlXatmP7rx3P3InaxBeoomnAihCKRVQP22JZ6EY47Wc6BkroIuUUBOov1i.S5KPgErtP/EN5mcO.ChWQW21
The function hashes 'rasmuslerdorf' (creator of PHP) and outputs it, using the salt '$6$rounds=5000$usesomesillystringforsalt$'.