Confusion about different running times of two algorithms in C

  • A+

I have an array, long matrix[8*1024][8*1024], and two functions sum1 and sum2:

long sum1(long m[ROWS][COLS]) {     long register sum = 0;     int i,j;      for (i=0; i < ROWS; i++) {         for (j=0; j < COLS; j++) {             sum += m[i][j];         }     }     return sum; }  long sum2(long m[ROWS][COLS]) {     long register sum = 0;     int i,j;      for (j=0; j < COLS; j++) {         for (i=0; i < ROWS; i++) {             sum += m[i][j];         }     }      return sum; } 

When I execute the two functions with the given array, I get running times:

sum1: 0.19s

sum2: 1.25s

Can anyone explain why there is this huge difference?


Computers generally use cache to help speed up access to main memory.

The hardware usually used for main memory is relatively slow—it can take many processor cycles for data to come from main memory to the processor. So a computer generally includes a smaller amount very fast but expensive memory called cache. Computers may have several levels of cache, some of it built into the processor or the processor chip itself and some of it outside the processor chip.

Since the cache is smaller, it cannot hold everything in main memory. It often cannot even hold everything that one program is using. So the processor has to make decisions about what is kept in cache.

The most frequent accesses of a program are to consecutive locations in memory. Very often, after a program reads element 237 of an array, it will soon read 238, then 239, and so on. It is less often that it reads 7024 just reading 237.

So the operation of cache is designed to keep portions of main memory that are consecutive in cache. Your sum1 program works well with this because it changes the column index most rapidly, keeping the row index constant while all the columns are processed. The array elements it accesses are laid out consecutively in memory.

Your sum2 program does not work well with this because it changes the row index most rapidly. This skips around in memory, so many of the accesses it makes are not satisfied by cache and have to come from slower main memory.


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: