Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?

  • A+
Category:Languages

I tested sys.getsize('') and sys.getsize(' ') in three environments, and in two of them sys.getsize('') gives me 51 bytes (one byte more than the second) instead of 49 bytes:

Screenshots:

Win8 + Spyder + CPython 3.6:

Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?

Win8 + Spyder + IPython 3.6:

Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?

Win10 (VPN remote) + PyCharm + CPython 3.7:

Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?

First edit

I did a second test in Python.exe instead of Spyder and PyCharm (These two are still showing 51), and everything seems to be good. Apparently I don't have the expertise to solve this problem so I'll leave it to you guys :)

Win10 + Python 3.7 console versus PyCharm using same interpreter:

Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?

Win8 + IPython 3.6 + Spyder using same interpreter:

Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?

 


This sounds like something is retrieving the wchar representation of the string object. As of CPython 3.7, the way the CPython Unicode representation works out, an empty string is normally stored in "compact ASCII" representation, and the base data and padding for a compact ASCII string on a 64-bit build works out to 48 bytes, plus one byte of string data (just the null terminator). You can see the relevant header file here.

For now (this is scheduled for removal in 4.0), there is also an option to retrieve a wchar_t representation of a string. On a platform with 2-byte wchar_t, the wchar representation of an empty string is 2 bytes (just the null terminator again). The wchar representation is cached on the string on first access, and str.__sizeof__ accounts for this extra data when it exists, resulting in a 51-byte total.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: