How to compute the hash of a tuple in python

  • A+
Category:Languages

In python, if I have a tuple with many elements, is its hash calculated from its elements' ids or its elements' content?

In this example,


a = (1, [1,2])
hash(a)

It errors out saying list is unhashable. So I guess it's not computed by id, or probably there is a check on whether the element is mutable.

Now see this example


class A: pass
a0 = A()
ta = (1, a0)
hash(ta) # -1122968024
a0.x = 20
hash(ta) # -1122968024

Here it turns out the hash of ta does not change with the modification of its element, i.e., a0. So maybe a0's id is used for the hash calculation? Is a0 somehow considered as immutable? How does python know if a type is mutable?

Now consider this case


b = (1, 2)
id(b) # 3980742764
c = (1, 2)
id(c) # 3980732588
tb = (1, b)
tc = (1, c)
hash(tb) # -1383040070
hash(tc) # -1383040070

It seems the content of b and c are used for the hash calculation.

How should I understand these examples?


 

Neither. It is calculated on the basis of the hashes of these elements, not the contents (values).

Take a look at this paragraph in python's documentation glossary.

Whether something is hashable or not, and how it is hashed, depends on the implementation of its .__hash__() method. Python itself has no idea about mutability of an object.

In your first example, tuple happens to hash itself on the basis of its elements, while a list doesn't have a hash at all - the .__hash__() method is not implemented for it (and for a good reason). That's why a tuple with a list object inside of it is not hashable.

Now, having that in mind, let's have a look at python data model documentation, and what it has to say on the topic:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

That's why you don't have to define .__hash__() for your classes - python does it for you in this case. The default implementation doesn't take instance fields into the account though. That's why you can change the values inside your object without changing it's hash.

In this regard you're right - the (CPython) default implementation of the hashing function for custom classes relies on the id() of an object, and not on the values inside of it. It is an implementation detail, and it differs between Python versions though. In more recent versions of Python the relation between hash() and id() involves some randomization.


But how does it actually hash itself?

While the details are quite complicated and probably involve some advanced math, the implementation of the hash function for tuple objects is written in C, and can be seen here (see static Py_hash_t tuplehash(PyTupleObject *v).

The calculation involves XORing a constant with the hashes of each of the tuple's elements. The line responsible for hashing of the elements is this one:

y = PyObject_Hash(*p++); 

So, to answer your original question: it does a bunch of XOR hokus-pocus with the hashes of each of it's elements. Whether or not the contents of these elements are used depends on their specific hash functions.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: