In python, if I have a tuple with many elements, is its hash calculated from its elements'
ids or its elements' content?
In this example,
a = (1, [1,2])
It errors out saying list is unhashable. So I guess it's not computed by id, or probably there is a check on whether the element is mutable.
Now see this example
class A: pass
a0 = A()
ta = (1, a0)
hash(ta) # -1122968024
a0.x = 20
hash(ta) # -1122968024
Here it turns out the hash of
ta does not change with the modification of its element, i.e.,
a0. So maybe
a0's id is used for the hash calculation? Is
a0 somehow considered as immutable? How does python know if a type is mutable?
Now consider this case
b = (1, 2)
id(b) # 3980742764
c = (1, 2)
id(c) # 3980732588
tb = (1, b)
tc = (1, c)
hash(tb) # -1383040070
hash(tc) # -1383040070
It seems the content of
c are used for the hash calculation.
How should I understand these examples?
Neither. It is calculated on the basis of the hashes of these elements, not the contents (values).
Take a look at this paragraph in python's documentation glossary.
Whether something is hashable or not, and how it is hashed, depends on the implementation of its
.__hash__() method. Python itself has no idea about mutability of an object.
In your first example,
tuple happens to hash itself on the basis of its elements, while a
list doesn't have a hash at all - the
.__hash__() method is not implemented for it (and for a good reason). That's why a
tuple with a
list object inside of it is not hashable.
Now, having that in mind, let's have a look at python data model documentation, and what it has to say on the topic:
User-defined classes have
__hash__()methods by default; with them, all objects compare unequal (except with themselves) and
x.__hash__()returns an appropriate value such that
x == yimplies both that
x is yand
hash(x) == hash(y).
That's why you don't have to define
.__hash__() for your classes - python does it for you in this case. The default implementation doesn't take instance fields into the account though. That's why you can change the values inside your object without changing it's hash.
In this regard you're right - the (CPython) default implementation of the hashing function for custom classes relies on the
id() of an object, and not on the values inside of it. It is an implementation detail, and it differs between Python versions though. In more recent versions of Python the relation between
id() involves some randomization.
But how does it actually hash itself?
While the details are quite complicated and probably involve some advanced math, the implementation of the hash function for tuple objects is written in C, and can be seen here (see
static Py_hash_t tuplehash(PyTupleObject *v).
The calculation involves XORing a constant with the hashes of each of the tuple's elements. The line responsible for hashing of the elements is this one:
y = PyObject_Hash(*p++);
So, to answer your original question: it does a bunch of XOR hokus-pocus with the hashes of each of it's elements. Whether or not the contents of these elements are used depends on their specific hash functions.