Generics and usage of interfaces without boxing of value instances

  • A+
Category:Languages

As I understand, generics is an elegant solution to resolve issues with extra boxing/unboxing procedures which occur within generic collections like List. But I cannot understand how generics can fix problems with using interfaces within a generic function. In other words, if I want to pass a value instance which implements an interface of a generic method, will boxing be performed? How does the compiler treat such cases?

As I understand, in order to use the interface method the value instance should be boxed, because a call of a "virtual" function requires "private" information contained within the reference object (it's contained within all reference objects (it also has a synch block))

That's why I decided to analyze the IL code of a simple program to see if any boxing operations are used within the generic function:

public class main_class {     public interface INum<a> { a add(a other); }     public struct MyInt : INum<MyInt>     {         public MyInt(int _my_int) { Num = _my_int; }         public MyInt add(MyInt other) => new MyInt(Num + other.Num);         public int Num { get; }     }      public static a add<a>(a lhs, a rhs) where a : INum<a> => lhs.add(rhs);      public static void Main()     {         Console.WriteLine(add(new MyInt(1), new MyInt(2)).Num);     } } 

I thought that add(new MyInt(1), new MyInt(2)) will use boxing operations because the add generic method uses the INum<a> interface (otherwise how can compiler emit virtual method call of value instance without boxing??). But I was very surprised. Here is a piece of IL code of Main:

IL_0000: ldc.i4.1 IL_0001: newobj instance void main_class/MyInt::.ctor(int32) IL_0006: ldc.i4.2 IL_0007: newobj instance void main_class/MyInt::.ctor(int32) IL_000c: call !!0 main_class::'add'<valuetype main_class/MyInt>(!!0, !!0) IL_0011: stloc.0 

Such listing does not have box instructions. It seems like newobj does not create a value instance on heap, for values it creates them on stack. Here is a description from documentation:

(ECMA-335 standard (Common Language Infrastructure) III.4.21) Value types are not usually created using newobj. They are usually allocated either as arguments or local variables, using newarr (for zero-based, one-dimensional arrays), or as fields of objects. Once allocated, they are initialized using initobj. However, the newobj instruction can be used to create a new instance of a value type on the stack, that can then be passed as an argument, stored in a local, etc.

So, I decided to check out the add function. It's very interesting, because it does not containt box instructions either:

.method public hidebysig static  !!a 'add'<(class main_class/INum`1<!!a>) a> (     !!a lhs,     !!a rhs ) cil managed  {     // Method begins at RVA 0x2050     // Code size 15 (0xf)     .maxstack 8      IL_0000: ldarga.s lhs     IL_0002: ldarg.1     IL_0003: constrained. !!a     IL_0009: callvirt instance !0 class main_class/INum`1<!!a>::'add'(!0)     IL_000e: ret } // end of method main_class::'add' 

What's wrong with my assumptions? Can generics invoke virtual methods of values without boxing?

 


As I understand, generics is an elegant solution to resolve issues with extra boxing/unboxing procedures which occur within generic collections like List<T>.

Eliminating boxing was a by-design scenario for generics, yes. But as Damien points out in a comment, the more general feature was enabling more concise, more type-safe code.

if I want to pass a value instance which implements an interface of a generic method, will boxing be performed?

Sometimes, yes. But since boxing is expensive, the CLR looks for ways to avoid it.

I thought that add(new MyInt(1), new MyInt(2)) will use boxing operations because the add generic method uses the INum<a> interface

I see why you made that deduction, but it is wrong. How the body of the method you called uses the information is irrelevant. The question is: what is the signature of the method you are calling? C# type inference determines that you are calling add<MyInt>, and therefore the signature is equivalent to calling:

public static MyInt add(MyInt lhs, MyInt rhs) 

Now, you rightly point out that there is a constraint. The C# compiler verifies the constraint is met, which it is. That does not change the calling convention of the method. The method takes two MyInts, and you've passed it two MyInts, and they are value types, so they are passed by value.

It seems like newobj does not create a value instance on heap, for values it creates them on stack.

Make sure that this is clear: it creates them on the abstract evaluation stack of the IL program. Whether the jitter turns that code into code that puts the values on the actual stack of the current thread is an implementation detail of the jitter. It could choose to put them in registers, for example, or into a data structure that has the logical properties of a stack, but is actually stored on the heap, or whatever.

add does not contain box instructions either

Yes it does, you just aren't seeing them. It contains a constrained callvirt which is a conditional box.

constrained callvirt has the semantics:

  • there must be a reference to the receiver on the stack. There is: ldarga puts the address of the receiver on the stack. If the receiver is a reference type, the address of the variable containing the reference will be on the stack. If it is a value type, then the address of the variable that holds the value type will be on the stack. (Again, this is the stack of the virtual machine we are reasoning about here.)

  • the arguments must be on the stack. They are; the argument to INum<MyInt>.add is a MyInt, and again, that is passed by value, and the value is on the stack from the ldarg.

  • if the receiver is a reference type, we then dereference the double-reference we just created to get the reference and the virtual call happens normally. (Of course, the jitter is free to optimize away this double-reference! Remember, all these semantics I am describing are of the virtual machine of the IL program, not of the real machine you're running it on!)

  • if the receiver is a value type and the value type implements the method you're calling, then the value type's method is called normally: that is, without boxing the value. This is the case your example is in, so we avoid boxing.

  • if the receiver is a value type that does not implement the method you're calling, then the value type is boxed, and the method is called with a reference to the box as the receiver. Exercise to the reader: Create a program that falls into this case.

What's wrong with my assumptions?

You've assumed that calls to methods on value types via interfaces must box the receiver, but that's not true.

Can generics invoke virtual methods of values without boxing?

Yes.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: