Is it well-defined to hold a misaligned pointer, as long as you don't ever dereference it?

  • A+
Category:Languages

I have some C code that parses packed/unpadded binary data that comes in from the network.

This code was/is working fine under Intel/x86, but when I compiled it under ARM it would often crash.

The culprit, as you might have guessed, was unaligned pointers -- in particular, the parsing code would do questionable things like this:

uint8_t buf[2048]; [... code to read some data into buf...] int32_t nextWord = *((int32_t *) &buf[5]);  // misaligned access -- can crash under ARM! 

... that's obviously not going to fly in ARM-land, so I modified it to look more like this:

uint8_t buf[2048]; [... code to read some data into buf...] int32_t * pNextWord = (int32_t *) &buf[5]; int32 nextWord; memcpy(&nextWord, pNextWord, sizeof(nextWord));  // slower but ARM-safe 

My question (from a language-lawyer perspective) is: is my "ARM-fixed" approach well-defined under the C language rules?

My worry is that maybe even just having a misaligned-int32_t-pointer might be enough to invoke undefined behavior, even if I never actually dereference it directly. (If my concern is valid, I think I could fix the problem by changing pNextWord's type from (const int32_t *) to (const char *), but I'd rather not do that unless it's actually necessary to do so, since it would mean doing some pointer-stride arithmetic by hand)

 


No, it is not well-defined. C11 6.3.2.3p7:

  1. A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined. [...]

Indeed, the the code that you assume is ARM safe might not be even Intel safe - compiles are known to generate code for Intel that can crash on unaligned access. While not in the linked case, it might just be that a clever compiler can take the conversion as a proof that the address is indeed aligned and use a specialized code for memcpy.


Alignment aside, the code also suffers from strict aliasing violation. C11 6.5p7:

  1. An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
    • a type compatible with the effective type of the object,
    • a qualified version of a type compatible with the effective type of the object,
    • a type that is the signed or unsigned type corresponding to the effective type of the object,
    • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    • a character type.

Since the array buf[2048] is statically typed, each element being char, and therefore the effective types of the elements are char; you may access the contents of the array only as characters, not as ints.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: