We need to preallocate. But MATLAB does not preallocate the preallocation?

  • A+
Category:Languages

While testing if any() short-circuits (it does!) I found out the following interesting behavior when preallocating the test variable:

test=zeros(1e7,1); >> tic;any(test);toc Elapsed time is 2.444690 seconds. >> test(2)=1; >> tic;any(test);toc Elapsed time is 0.000034 seconds. 

However if I do:

test=ones(1e7,1); test(1:end)=0; tic;any(test);toc Elapsed time is 0.642413 seconds. >> test(2)=1; >> tic;any(test);toc Elapsed time is 0.000021 seconds. 

Turns out that this happens because the variable is not really on RAM until its completely filled with information, therefore the first test takes longer because it needs to allocate it. The way I checked this was by looking at the memory used in the Windows Task Manager.

While this may make some sense (do not initialize until its needed), what confused me a bit more is the following test, where the variable is filled in a for loop and at some point the execution is stopped.

test=zeros(1e7,1);  for ii=1:1e7     test(ii)=1;     if ii==1e7/2         pause     end end 

When checking the memory used by MATLAB, I could see how when stopped, it was using only 50% of test needed memory (if it was full). This can be reproduced with different % of memory quite solidly.

Interestingly the following does not allocate the entire matrix either.

test=zeros(1e7,1); test(end)=1; 

I know that MATLAB is not dynamically allocating and increasing the size of test in the loop, as that would make the end iterations very slow (due to the high memcopys that would need) and it would also allocate the entire array in this last test I proposed. So my question is:

What is going on?

Someone suggested that this can be related to virtual-memory vs physical-memory, and related to how the OS sees memory. Not sure how that links to the first test proposed here though. Any further explanation would be ideal.

Win 10 x64, MATLAB 2017a

 


This behavior is not unique to MATLAB. In fact, MATLAB has no control over it, as it is Windows that causes it. Linux and MacOS show the same behavior.

I had noticed this exact same thing in a C program many years ago. It turns out that this is well documented behavior. This excellent answer explains in gory details how memory management works in most modern OSes (thanks Amro for sharing the link!). Read it if this answer doesn't have enough detail for you.

First, let's repeat Ander's experiment in C:

#include <stdlib.h> #include <stdio.h> #include <unistd.h>  int main (void) {     const int size = 1e8;     /* For Linux: */    // const char* ps_command = "ps --no-headers --format /"rss vsz/" -C so";    /* For MacOS: */    char ps_command[128];    sprintf(ps_command, "ps -o rss,vsz -p %d", getpid());     puts("At program start:");    system(ps_command);     /* Allocate large chunck of memory */     char* mem = malloc(size);     puts("After malloc:");    system(ps_command);     for(int ii = 0; ii < size/2; ++ii) {       mem[ii] = 0;    }     puts("After writing to half the array:");    system(ps_command);     for(int ii = size/2; ii < size; ++ii) {       mem[ii] = 0;    }     puts("After writing to the whole array:");    system(ps_command);     char* mem2 = calloc(size, 1);     puts("After calloc:");    system(ps_command);     free(mem);    free(mem2); } 

The code above works on a POSIX-compliant OS (i.e. any OS except Windows), but on Windows you can use Cygwin to become (mostly) POSIX-compliant. You might need to change the ps command syntax depending on your OS. Compile with gcc so.c -o so, run with ./so. I see the following output on MacOS:

At program start:    RSS      VSZ    800  4267728 After malloc:    RSS      VSZ    816  4366416 After writing to half the array:    RSS      VSZ  49648  4366416 After writing to the whole array:    RSS      VSZ  98476  4366416 After calloc:    RSS      VSZ  98476  4464076 

There are two columns displayed, RSS and VSZ. RSS stands for "Resident set size", it is the amount of physical memory (RAM) that the program is using. VSZ stands for "Virtual size", it is the size of the virtual memory assigned to the program. Both quantities are in KiB.

The VSZ column shows 4 GiB at program start. I'm not sure what that is about, it seems over the top. But the value grows after malloc and again after calloc, both times with approximately 98,000 KiB (slightly over the 1e8 bytes we allocated).

In contrast, the RSS column shows an increase of only 16 KiB after we allocated 1e8 bytes. After writing to half the array, we have a bit over 5e7 bytes of memory in use, and after writing to the full array we have a bit over 1e8 bytes in use. Thus, the memory gets assigned as we use it, not when we first ask for it. Next, we allocate another 1e8 bytes using calloc, and see no change in the RSS. Note that calloc returns a memory block that is initialized to 0, exactly like MATLAB's zeros does.

I am talking about calloc because it is likely that MATLAB's zeros is implemented through calloc.

Explanation:

Modern computer architectures separate virtual memory (the memory space that a process sees) from physical memory. The process (i.e. a program) uses pointers to access memory, these pointers are addresses in virtual memory. These addresses are translated by the system into physical addresses when used. This has many advantages, for example it is impossible for one process to address memory assigned to another process, since none of the addresses it can generate will ever be translated to physical memory not assigned to that process. It also allows the OS to swap out memory of an idling process to let another process use that physical memory. Note that the physical memory for a contiguous block of virtual memory doesn't need to be contiguous!

The key is the bolded italic text above: when used. Memory assigned to a process might not actually exist until the process tries to read from or write to it. This is why we don't see any change in RSS when allocating a large array. Memory used is assigned to physical memory in pages (blocks typically of 4 KiB, sometimes up to 1 MiB). So when we write to one byte of our new memory block, only one page gets assigned.

Some OSes, like Linux, will even "overcommit" memory. Linux will assign more virtual memory to processes than it has the capacity to put into physical memory, under the assumption that those processes will not use all the memory they are assigned anyway. This answer will tell you more over overcommitting than you will want to know.

So what happens with calloc, which returns zero-initialized memory? This is also explained in the answer I linked earlier. For small arrays malloc and calloc return a block of memory from a larger pool obtained from the OS at the start of the program. In this case, calloc will write zeros to all bytes to make sure it is zero-initialized. But for larger arrays, a new block of memory is directly obtained from the OS. The OS always gives out memory that is zeroed out (again, it prevents one program to see data from another program). But because the memory doesn't get physically assigned until used, the zeroing out is also delayed until a memory page is put into physical memory.

Back to MATLAB:

The experiment above shows that it is possible to obtain a zeroed-out block of memory in constant time and without changing the physical size of a program's memory. This is how MATLAB's function zeros allocates memory without you seeing any change in MATLAB's memory footprint.

The experiment also shows that zeros allocates the full array (likely through calloc), and that memory footprint only increases as this array is used, one page at a time.

The preallocation advice by the MathWorks states that

you can improve code execution time by preallocating the maximum amount of space required for the array.

If we allocate a small array, then want to increase its size, a new array has to be allocated and data copied over. How the array is associated to RAM has no influence on this, MATLAB only sees virtual memory, it has no control (or even knowledge?) of where in the physical memory (RAM) these data are stored. All that matters for an array from MATLAB's point of view (or that of any other program) is that the array is a contiguous block of virtual memory. Enlarging an existing block of memory is not always (usually not?) possible, and so a new block is obtained and data copied over. For example, see the graph in this other answer: when the array is enlarged (this happens at the large vertical spikes) data is copied; the larger the array, the more data needs to be copied.

Preallocating avoids enlarging the array, as we make it large enough to begin with. In fact, it is more efficient to make an array that is way too large for what we need, as the portion of the array that we don't use is actually never really given to the program. That is, if we allocate a very large block of virtual memory, and only use the first 1000 elements, we'll only really use a few pages of physical memory.

The behavior of calloc described above explains also this other strange behavior of the zeros function: For small arrays, zeros is more expensive than for large arrays, because small arrays need to be zeroed explicitly by the program, whereas large arrays are implicitly zeroed by the OS.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: