How to properly access mapped memory without undefined behavior in C++

  • A+
Category:Languages

I've been trying to figure out how to access a mapped buffer from C++17 without invoking undefined behavior. For this example, I'll use a buffer returned by Vulkan's vkMapMemory.

So, according to N4659 (the final C++17 working draft), section [intro.object] (emphasis added):

The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is created by a definition (6.1), by a new-expression (8.3.4), when implicitly changing the active member of a union (12.3), or when a temporary object is created (7.4, 15.2).

These are, apparently, the only valid ways to create a C++ object. So let's say we get a void* pointer to a mapped region of host-visible (and coherent) device memory (assuming, of course, that all the required arguments have valid values and the call succeeds, and the returned block of memory is of sufficient size and properly aligned):

void* ptr{}; vkMapMemory(device, memory, offset, size, flags, &ptr); assert(ptr != nullptr); 

Now, I wish to access this memory as a float array. The obvious thing to do would be to static_cast the pointer and go on my merry way as follows:

volatile float* float_array = static_cast<volatile float*>(ptr); 

(The volatile is included since this is mapped as coherent memory, and thus may be written by the GPU at any point). However, a float array doesn't technically exist in that memory location, at least not in the sense of the quoted excerpt, and thus accessing the memory through such a pointer would be undefined behavior. Therefore, according to my understanding, I'm left with two options:

1. memcpy the data

It should always be possible to use a local buffer, cast it to std::byte* and memcpy the representation over to the mapped region. The GPU will interpret it as instructed in the shaders (in this case, as an array of 32-bit float) and thus problem solved. However, this requires extra memory and extra copies, so I would prefer to avoid this.

2. placement-new the array

It appears that section [new.delete.placement] doesn't impose any restrictions on how the placement address is obtained (it need not be a safely-derived pointer regardless of the implementation's pointer safety). It should, therefore, be possible to create a valid float array via placement-new as follows:

volatile float* float_array = new (ptr) volatile float[sizeInFloats]; 

The pointer float_array should now be safe to access (within the bounds of the array, or one-past).


So, my questions are the following:

  1. Is the simple static_cast indeed undefined behavior?
  2. Is this placement-new usage well-defined?
  3. Is this technique applicable to similar situations, such as accessing memory-mapped hardware?

As a side note, I've never had an issue by simply casting the returned pointer, I'm just trying to figure out what the proper way to do this would be, according to the letter of the standard.

 


Is the simple static_cast indeed undefined behavior?

volatile float* float_array = static_cast<volatile float*>(ptr); 

Yes, this is undefined behavior.

Is this placement-new usage well-defined?

volatile float* float_array = new (ptr) volatile float[sizeInFloats]; 

Even though this looks well defined, it is implementation dependent. It happens that operator ::new[] is allowed to reserve some overhead1, 2, and you cannot know how much unless you check your toolchain documentation.

Note: If it wern't a non-class type involved, you'd have to explicitly call the object destructor before freeing the memory.

A solution would be to manually build a sequence of floats:

float* p = static_cast<float*>(ptr); for (std::size_t n = 0 ; n < sizeInFloats ; ++n) {     ::new (p+n) volatile float; } 

Another (better?) solution would be to rely on the Standard Library:

#include <memory> auto p = const_cast<volatile float*>(static_cast<float*>(ptr)); std::uninitialized_default_construct(p, p+sizeInFloats); 

Is this technique applicable to similar situations, such as accessing memory-mapped hardware?

Memory-mapped hardware is another beast. This is really implementation-defined. By the Standard, merely dereferencing a pointer obtained by converting an integer is undefined behavior. You should refer to your toolchain documentation.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: