Hashing
Hashing
1 What is hashing?
Hashing is a method of more efficient storage and retrieval of information from an array (or array like object, like a file).
The idea:
- Each piece of data has a key attached to it, like a part number for a catalog item.
- We apply a function to the key to produce an array location. The function is called a hashing function. A very common function to use is key % array capacity, but there are others as well.
- If the array slot at the location is empty, then store the data there (and probably the key as well). If not, then a collision has occurred. There are various ways of (a) resolving collisions, so that the data gets stored, and (b) trying to minimize the number of collisions.
- To retrieve the data just reverse the process: hash the key, resolve and collisions that occur, identify the stored data with the key, and get the data that has been found.
2 Resolving collisions
Three ways:
- Sequential probing: If you hash a key to an index i and the slot there is occupied, then go to index i + 1, to i+2 if necessary, and so on.
Sequential probing works better if the size of the array is a prime number of the form 4k + 3, where k is an integer.
- Double hashing: If you hash a key to an index i using i % CAPACITY and the slot there is occupied, then go to index (i + hash2(key)) % CAPACITY, where hash2(key) = 1 + (key % (CAPACITY - 2)).
In this case CAPACITY and CAPACITY - 2 should be prime numbers.
- Chaining: Either the array is a two dimensional array, or the array is an array of linked lists. In the case of a collision, put the data and key in the first available slot in that column of the 2-D array, or add it to the end of the linked list.
File translated from TEX by TTH, version 2.25.
On 27 Nov 2002, 17:00.