Why Are HashMaps Implemented Using Powers of Two?

  • A+
Category:Languages

My question is why hashmap bucket size is power of 2 and I have gone through many answers on stackoverflow but i am still not convinced. Reasons are:

  1. I have read that having capacity as power of 2 makes & operation more efficient to calculate index so my question how exactly it is useful here. I can have size which might be power of 3 , I can still perform & operation like this (hash)&(length-1) so why exactly it should be power of 2 ?

  2. Also if the capacity is not power of 2 why i need to do remainder operation?

 


When you subtract 1 from a number which is a power of 2, what you get is a number whose binary representation is all 1. E.g. 16 is a power of 2. If you subtract 1 from it, you get 15, whose binary representation is 1111. Now, if you do a bitwise AND of any number with 1111, you're going to get the last 4 bits of the number which, in other words, is equivalent to the modulo of the number by 16 (Division operation is usually an expensive operation. Hence, bitwise operation is usually preferred over division). These last 4 bits will evaluate to any number from 0 to 15 which are the indexes of your underlying array.

You could make the size 17 instead. In that case, after subtracting 1 from it, you'd get 16 which is 10000 in binary. Now you do a bit wise AND of a number with 16, you'll lose all bits of the number except the 5th bit from the end. So, regardless of the number you take, the array index will be either 16 or 0. That means you'd have lot of collisions, which in turn means poor performance. Instead of O(1) for retrieval, you'd need O(log n), because when collision occurs, all the nodes in a given bucket are going to be stored in a red black tree. Not only that. In case you are using a ConcurrentHashMap in a multithreaded environmemt, you'd experience lot of synchronizations, because all the new additions will end up in a very small number of buckets (only two - 0 and 16 in the above case) and when you add new nodes in a bucket that already has other nodes, the bucket will be locked to avoid data inconsistancies due to modifications by multiple threads. Therefore, other threads trying to add new nodes need to wait until the current thread release the lock.

Finally, I should also mention that the Java HashMap implementation also shifts 16 bits of the hash code of key to the right and does bitwise XOR with the original hash code before doing the bitwise AND with (length - 1) to ensure that the effect of the higher order bits are also captured.

So, basically the point is, if the size is a power of two, the keys will be more evenly distributed across the array with minimal collision leading to better retrieval performance (and also less synchronizations in case of ConcurrentHashMap) when compared with any other size which is not a power of 2.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: