I have some C code that parses packed/unpadded binary data that comes in from the network.
This code was/is working fine under Intel/x86, but when I compiled it under ARM it would often crash.
The culprit, as you might have guessed, was unaligned pointers -- in particular, the parsing code would do questionable things like this:
uint8_t buf; [... code to read some data into buf...] int32_t nextWord = *((int32_t *) &buf); // misaligned access -- can crash under ARM!
... that's obviously not going to fly in ARM-land, so I modified it to look more like this:
uint8_t buf; [... code to read some data into buf...] int32_t * pNextWord = (int32_t *) &buf; int32 nextWord; memcpy(&nextWord, pNextWord, sizeof(nextWord)); // slower but ARM-safe
My question (from a language-lawyer perspective) is: is my "ARM-fixed" approach well-defined under the C language rules?
My worry is that maybe even just having a misaligned-int32_t-pointer might be enough to invoke undefined behavior, even if I never actually dereference it directly. (If my concern is valid, I think I could fix the problem by changing
pNextWord's type from
(const int32_t *) to
(const char *), but I'd rather not do that unless it's actually necessary to do so, since it would mean doing some pointer-stride arithmetic by hand)
No, it is not well-defined. C11 126.96.36.199p7:
- A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined. [...]
Indeed, the the code that you assume is ARM safe might not be even Intel safe - compiles are known to generate code for Intel that can crash on unaligned access. While not in the linked case, it might just be that a clever compiler can take the conversion as a proof that the address is indeed aligned and use a specialized code for
Alignment aside, the code also suffers from strict aliasing violation. C11 6.5p7:
- An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
Since the array
buf is statically typed, each element being
char, and therefore the effective types of the elements are
char; you may access the contents of the array only as characters, not as