Is reinterpret_cast type punning actually undefined behavior?

  • A+
Category:Languages

It appears to be widely-held that type punning via reinterpret_cast is somehow prohibited (properly: "undefined behavior", that is, "behavior for which this International Standard imposes no requirements", with an explicit note that implementations may define behavior) in C++. Am I incorrect in using the following reasoning to disagree, and if so, why?


[expr.reinterpret.cast]/11 states:

A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_­cast. The result refers to the same object as the source glvalue, but with the specified type. [ Note: That is, for lvalues, a reference cast reinterpret_­cast<T&>(x) has the same effect as the conversion *reinterpret_­cast<T*>(&x) with the built-in & and * operators (and similarly for reinterpret_­cast<T&&>(x)).  — end note ] No temporary is created, no copy is made, and constructors or conversion functions are not called.

with the footnote:

75) This is sometimes referred to as a type pun.

/11 implicitly, via example, carries the restrictions of /6 through /10, but perhaps the most common usage (punning objects) is addressed in [expr.reinterpret.cast]/7:

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)). [ Note: Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value.  — end note ]

Clearly the purpose cannot be conversion to/from pointers or references to void, as:

  1. the example in /7 clearly demonstrates that static_cast should suffice in the case of pointers, as do [expr.static.cast]/13 and [conv.ptr]/2; and
  2. [conversions to] references to void are prima facie invalid.

Further, [basic.lval]/8 states:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

(8.1) the dynamic type of the object,

(8.2) a cv-qualified version of the dynamic type of the object,

(8.3) a type similar to the dynamic type of the object,

(8.4) a type that is the signed or unsigned type corresponding to the dynamic type of the object,

(8.5) a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,

(8.6) an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

(8.7) a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,

(8.8) a char, unsigned char, or std​::​byte type.

And if we return to [expr.reinterpret.cast]/11 for a moment, we see "The result refers to the same object as the source glvalue, but with the specified type." This reads to me as an explicit statement that the result of reinterpret_cast<T&>(v) is an lvalue reference to an object of type T, to which access is clearly "through a glvalue of" "the dynamic type of the object". This sentence also addresses the argument that various paragraphs of [basic.life] apply via the spurious claim that the results of such conversions refer to a new object of type T, the lifetime of which has not yet begun, which just happens to reside at the same memory address as v.

It seems nonsensical to explicitly define such conversions only to disallow standard-defined use of the results, particularly in light of footnote 75 noting that such [reference] conversion is "sometimes referred to as a type pun."

Note that my references are to the final publicly-available draft for C++17 (N4659), but the language in question is little-changed from N3337 (C++11) through N4788 (C++20 WD) (tip link will likely refer to later drafts in time). In fact the footnote to [expr.reinterpret.cast]/11 is made even more explicit in the most recent draft:

This is sometimes referred to as a type pun when the result refers to the same object as the source glvalue.

 


I believe your misunderstanding lies here:

This reads to me as an explicit statement that the result of reinterpret_cast<T&>(v) is an lvalue reference to an object of type T, to which access is clearly "through a glvalue of" "the dynamic type of the object".

The dynamic type of an object is the type of the object that is currently living in a given place (effectively, the type of the object that was constructed/initialized in that piece of memory). For example:

float my_float = 42.0f; std::uint32_t& ui = reinterpret_cast<std::uint32_t&>(my_float); 

here, ui is a glvalue that refers to the object created by the definition of my_float. Accessing this object through the reference ui would invoke undefined behavior, however, because the dynamic type of the object the reference refers to is float while the type of the glvalue is std::uint32_t.

There are few valid uses of a reinterpret_cast like that, but use cases other than just casting to void* and back exist (for the latter, static_cast would be sufficient, as you noted yourself). [basic.lval]/8 effectively gives you a complete list of what they are. For example, it would be valid to examine (and even copy if the dynamic type of the object is trivially-copyable [basic.types]/9) the value of an object by casting the address of the object to char*, unsigned char*, or std::byte8 (not signed char*, however). It would be valid to reinterpret_cast an object of signed type to access it as its corresponding unsigned type and vice versa. It would also be valid to cast a pointer/reference to a union to a pointer/reference to a member of that union and access that member through the resulting lvalue if that member is the active member of the union…

The main reason why type punning through casts like this is undefined in general is that making it defined behavior would prohibit some extremely vital compiler optimizations. If you'd allow any object of any type to simply be accessed through an lvalue of any other type, then the compiler would have to assume that any modification of an object through some lvalue can potentially affect the value of any object in the program unless it can prove otherwise. As a result, it would basically be impossible, for example, to keep stuff around in registers for any useful period of time because any modification of anything would immediately invalidate whatever you may have in registers at the moment. Yes, any good optimizer will perform aliasing analysis. But, while such methods certainly work and are powerful, they can, out of principle, only cover a subset of cases. Disproving or proving aliasing in general is basically impossible (equivalent to solving the halting problem I would think)…

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: