Why can a 352GB NumPy ndarray be used on an 8GB memory macOS computer?

  • A+
Category:Languages
import numpy as np  array = np.zeros((210000, 210000)) # default numpy.float64 array.nbytes 

When I run the above code on my 8GB memory MacBook with macOS, no error occurs. But running the same code on a 16GB memory PC with Windows 10, or a 12GB memory Ubuntu laptop, or even on a 128GB memory Linux supercomputer, the Python interpreter will raise a MemoryError. All the test environments have 64-bit Python 3.6 or 3.7 installed.

 


You are most likely using Mac OS X Mavericks or newer, so 10.9 or up. From that version onwards, MacOS uses virtual memory compression, where memory requirements that exceed your physical memory are not only redirected to memory pages on disk, but those pages are compressed to save space.

For your ndarray, you may have requested ~332GB of memory, but it's all a contiguous sequence of NUL bytes at the moment, and that compresses really, really well:

Why can a 352GB NumPy ndarray be used on an 8GB memory macOS computer?

That's a screenshot from the Activity Monitor tool, with the process details of my Python process where I replicated your test (use the (I) icon on the toolbar to open it); this is from the Memory tab, where you can see that the Real Memory Size column is only 9.3 MB used, against a Virtual Memory Size of 332.71GB.

Once you start setting other values for those indices, you'll quickly see the memory stats increase to gigabytes instead of megabytes:

while True:     index = tuple(np.random.randint(array.shape[0], size=2))     array[index] = np.random.uniform(-10 ** -307, 10 ** 307) 

or you can push the limit further by assigning to every index (in batches, so you can watch the memory grow):

array = array.reshape((-1,)) for i in range(0, array.shape[0], 10**5):     array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5) 

The process is eventually terminated; my Macbook Pro doesn't have enough swap space to store hard-to-compress gigabytes of random data:

>>> array = array.reshape((-1,)) >>> for i in range(0, array.shape[0], 10**5): ...     array[i:i + 10**5] = np.random.uniform(-10 ** -307, 10 ** 307, 10**5) ... Killed: 9 

You could argue that MacOS is being too trusting, letting programs request that much memory without bounds, but with memory compression, memory limits are much more fluid. Your np.zeros() array does fit your system, after all. Even though you probably don't actually have the swap space to store the uncompressed data, compressed it all fits fine so MacOS allows it and terminates processes that then take advantage of the generosity.

If you don't want this to happen, use resource.setrlimit() to set limits on RLIMIT_STACK to, say 2 ** 14, at which point the OS will segfault Python when it exceeds the limits.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: