Hashing Hashing

1  What is hashing?

Hashing is a method of more efficient storage and retrieval of information from an array (or array like object, like a file).

The idea:

  1. Each piece of data has a key attached to it, like a part number for a catalog item.

  2. We apply a function to the key to produce an array location. The function is called a hashing function. A very common function to use is key  % array capacity, but there are others as well.

  3. If the array slot at the location is empty, then store the data there (and probably the key as well). If not, then a collision has occurred. There are various ways of (a) resolving collisions, so that the data gets stored, and (b) trying to minimize the number of collisions.

  4. To retrieve the data just reverse the process: hash the key, resolve and collisions that occur, identify the stored data with the key, and get the data that has been found.

2  Resolving collisions

Three ways:

  1. Sequential probing: If you hash a key to an index i and the slot there is occupied, then go to index i + 1, to i+2 if necessary, and so on.

    Sequential probing works better if the size of the array is a prime number of the form 4k + 3, where k is an integer.

  2. Double hashing: If you hash a key to an index i using i  % CAPACITY and the slot there is occupied, then go to index (i + hash2(key))  % CAPACITY, where hash2(key) = 1 + (key  %  (CAPACITY - 2)).

    In this case CAPACITY and CAPACITY - 2 should be prime numbers.

  3. Chaining: Either the array is a two dimensional array, or the array is an array of linked lists. In the case of a collision, put the data and key in the first available slot in that column of the 2-D array, or add it to the end of the linked list.


File translated from TEX by TTH, version 2.25.
On 27 Nov 2002, 17:00.