HashMap - how it works?

With maps we can access stored objects (values) using keys assigned to them. One of Map interface implementation is HashMap class. In this article we will have a look how HashMap can be used and how it works under the hood. 


Basic Usage

Lets have a look at basic usage of HashMap:


We have used HashMap to be able to quickly map a person's name to his car. It is a typical use case of a map.  As you can see Car class overrides two methods from Object class - hashCode() and equals(). Those will be used by HashMap internally, as described in next section. It is important to remember two things:

  1. hashCode() is used to assign a number to an object. 
  2. There is a contract between  hashCode() and equals() which must be fulfilled - when equals() returnes true for compared objects, then hashCode()values must be the same for both objects.
To learn more about hashCode() have a look at the artcile - hashCode() - how it works?


How it works under the hood?


Lets have a look step by step, what happens under the hood of HashMap. When we call put(key, value) method, hashCode() method is called for the key value. The returned value is hashed again to compute the index value of internal table, where the data should be stored. Under this index HashMap will store a new entry which is a composition of both key and value. This is a simple explanation of put(key, value) method. When we want to retrieve a value, we should provide a key, based on which HashMap calculates an index and retrieves a value. To clearify it more, below is the draft representaing HashMap in terms of retrieving/storing values.

















On the draft we can see that there can be more than one entry in a bucket (under an index). It is caused by a collision of hashes. It can happen that the hash computed from  hashCode()value, which is used to determine index of internal table, will be the same as previously computed hash for different value. It means that a collision will occur - there would be two entries for only one index in table. It is possible since the same hash code can occur for different objects (vice versa is not possible - the same objects will never have different hash values). In case of collision HashMap implementation puts a linked list  (tree map from Java 8) inside given index. This list will contain all the entries for keys that happen to have the same hash. To retrieve entry that is inside linked list, HashMap implementation iterates over the list comparing given key with each entry's key using equals() method. Value of entry with key equal to key of object we are searching for will be returned as a result. 

Performance

Due to the fact that after computing hash value HashMap obtains a proper table index, the performance of entry retrieval is O(1). However in case of collision there is a need to iterate over linked list which leads to worse performance. Typically it would be O(n), however since Java 8 there has been some improvements (substitution of linked lists with tree map) which lead to O(logn) performance in worst case scenarios. 

Thread safety

HashMap is not thread safe. When we let multiple threads manipulate its content, then we could end up with incorrect data. To make HashMap thread safe we can use Collections class:


Collections.synchronizedMap(...) returns a map which is thread safe due to synchronization of internal map method calls. We have to remember that if we try to use the original map passed as parameter to synchronizedMap, then it will not be thread safe. We have to use a wrapper returned by synchronizedMap method.

A thread safe alternative could be ConcurrentHashMap which is a separate implementation, however based on the same ideas as those used in HashMap. It is thread safe and has better performance results then map returned by Collections.synchronizedMap(...) method.





Komentarze

Popularne posty z tego bloga

Spring Data 1# - how repositories work under the hood

Hibernate 1# - entity states overview

Data consistency and performance in web applciation