- A+

Is there any possible optimization for random access on a very big array (I currently use `uint8_t`

, and I'm asking about whats better)

`uint8_t MyArray[10000000]; `

when the value at any position in the array is

**0**or**1**for*95%*of all cases,**2**in*4%*of cases,- between
**3**and**255**in the other*1%*of cases?

So, is there anything better than a `uint8_t`

array to use for this? It should be as quick as possible to loop over the whole array in a random order, and this is very heavy on RAM bandwidth, so when having more than a few threads doing that at the same time for different arrays, currently the whole RAM bandwidth is quickly saturated.

I'm asking since it feels very inefficient to have such a big array (10 MB) when it's actually known that almost all values, apart from 5%, will be either 0 or 1. So when 95% of all values in the array would only actually need 1 bit instead of 8 bit, this would reduce memory usage by almost an order of magnitude. It feels like there has to be a more memory efficient solution that would greatly reduce RAM bandwidth required for this, and as a result also be significantly quicker for random access.

A simple possibility that comes to mind is to keep a compressed array of 2 bit per value for the common cases, and a separated 4-byte per value (24 bit for original element index, 8 bit for actual value, so `(idx << 8) | value)`

) sorted array for the other ones.

When you look up a value, you do first a lookup in the 2bpp array (O(1)); if you find 0, 1 or 2 it's the value you want, if you find 3 it means that you have to look it up in the secondary array. Here you'll perform a binary search to look for the *index* of your interest left-shifted of 8 (O(log(n) with a small n, as this should be the 1%), and extract the value from the 4-byte thingie.

`uint8_t *main_arr = ...; uint32_t sec_arr = ...; uint32_t sec_arr_size = ...; uint8_t lookup(unsigned idx) { // extract the 2 bits of our interest from the main array uint8_t v = (main_arr[idx>>2]>>(2*(idx&3)))&3; // usual (likely) case: value between 0 and 2 if(v != 3) return v; // bad case: lookup the index<<8 in the secondary array // lower_bound finds the first >=, so we don't need to mask out the value uint32_t *ptr = std::lower_bound(sec_arr, sec_arr+sec_arr_size, idx<<8); #ifdef _DEBUG // some coherency checks if(ptr == sec_arr+sec_arr_size) std::abort(); if((*ptr >> 8) != idx) std::abort(); #endif // extract our 8-bit value from the 32 bit (index, value) thingie return (*ptr) & 0xff; } `

For an array such as the one you proposed, this should take 10000000 / 4 = 2500000 bytes for the first array, plus 10000000 * 1% * 4 B = 400000 bytes for the second array; hence, 2900000 bytes, i.e. less than one third of the original array, and the most used portion is all kept together in memory, which should be good for caching (it may even fit L3).

If you need more than 24-bit addressing, you'll have to tweak the "secondary storage"; a trivial way to extend it is to have a 256 elements pointer array to switch over the top 8 bits of the index and forward to a 24-bit indexed sorted array as above.