February 6, 2013

Hash Table 3 -- Hashing

The idea of hashing is to distribute the entries (key/value pairs) across an array of buckets. 
Given a key (e) and an array--hashes with a size of arraySize, the algorithm-hash function (h) computes an entries to map the key to an index (i) between {0, arraySize-1}, which suggests where the entry can be found.

[1]
[1] Image from Wikipedia, A hash function that maps names to integers from 0 to 15.
There is a collision between keys "John Smith" and "Sandra Dee".

Given:

        e: key
        h: hash function
arraySize: the size of the array
        i: integer between {0, arraySize-1}

Then:

       i = h(e, arraySize)

Often above function is deduced from two steps:

hashCode = h(e)
       i = hashCode % arraySize

1.According to the hash function (h), we got the integer type hashCode derived from (e).
2.Then we got the index which is hashCode % arraySize, because the return number must fall in {0, array_size – 1}, by using a remainder operation (%).

Usually, the range of (e) values is much bigger than arraySize. We can image there might be a worse situation – for two different keys (e1) and (e2), if they have h(e1, arraySize) == h(e2, arraySize), then (e1) and (e2) are mapped into the same index of an array. This calls Collision.

Obviously, the most important part in this calculation is hashCode. The better hash function, the better hashCode is and the less collision we got. 
    Generally, a good hash function requires,
  • Easy to calculate, which means time complexity of hash function is Constant Time or Fixed Time.
  • Uniformly distribute keys to array to reduce the chance of collision.

No comments:

Post a Comment