I have a singleton object that process requests. Each request takes around one millisecond to be completed, usually less. This object is not thread-safe and it expects requests in a particular format, encapsulated in the
Request class, and returns the result as
Response. This processor has another producer/consumer that sends/receives through a socket.
I implemented the producer/consumer approach to work fast:
- Client prepares a
RequestCommandcommand object, that contains a
TaskCompletionSource<Response>and the intended
- Client add the command to the "request queue" (
Queue<>) and awaits
- A different thread (and actual background
Thread) pulls the command from the "request queue", process the
Responseand signals the command as done using
- Client continues working.
But when doing a small memory benchmark I see LOTS of these objects being created and topping the list of most common object in memory. Note that there is no memory leak, the GC can clean everything up nicely each time triggers, but obviously so many objects being created fast, makes Gen 0 very big. I wonder if a better memory usage may yield better performance.
I was considering convert some of these objects to structs to avoid allocations, specially now that there are some new features to work with them C# 7.1. But I do not see a way of doing it.
- Value types can be instantiated in the stack, but if they pass from thread to thread, they must be copied to the stackA->heap and heap->stackB I guess. Also when enqueuing in the queue, it goes from stack to heap.
- The singleton object is truly asynchronous. There is some in-memory processing, but 90% of the time it needs to call outside and going through the internal producer/consumer.
ValueTask<>does not seem to fit here, because things are asynchronous.
TaskCompletionSource<>has a state, but it is
object, so it would be boxed.
- The command also jumps from thread to thread.
- Reciclying objects only works for the command itself, its content cannot be recycled (
Is there any way I could leverage structs to reduce the memory usage or/and improve the performance? Any other option?
Value types can be instantiated in the stack, but if they pass from thread to thread, they must be copied to the stackA->heap and heap->stackB I guess.
No, that's not at all true. But you have a deeper problem in your thinking here:
Immediately stop thinking of structs as living on the stack. When you make an int array with a million ints, you think those four million bytes of ints live on your one-million-byte stack? Of course not.
The truth is that stack vs heap has nothing whatsoever to do with value types. Instead of "stack and heap", start saying "short term allocation pool" and "long term allocation pool". Variables that have short lifetimes are allocated from the short term allocation pool, regardless of whether that variable contains an int or a reference to an object. Once you start thinking about variable lifetime correctly then your reasoning becomes entirely straightforward. Short-lived things live in the short term pool, obviously.
So: when you pass a struct from one thread to another, does it ever live "on the heap"? The question is nonsensical because values are not things that live on the heap. Variables are things that are storage; variables store value.
So: Is it the case that turning classes into structs will improve performance because "those structs can live on the stack"? No, of course not. The relevant difference between reference types and value types is not where they live but how they are copied. Value types are copied by value, reference types are copied by reference, and reference copies are the fastest copies.
I see LOTS of these objects being created and topping the list of most common object in memory. Note that there is no memory leak, the GC can clean everything up nicely each time triggers, but obviously so many objects being created fast, makes Gen 0 very big. I wonder if a better memory usage may yield better performance.
OK, now we come to the sensible part of your question. This is an excellent observation and it is one which is testable with science. The first thing you should do is to use a profiler to determine what is the actual burden of gen 0 collections on the performance of your application.
It may be that this burden is not the slowest thing in your program and in fact it is irrelevant. In that case, you will now know to concentrate your efforts on the real problem, rather than chasing down memory allocation problems that aren't real problems.
Suppose you discover that gen 0 collections really are killing your performance; what can you do? Is the answer to make more things structs? That can work, but you have to be very careful:
- If the structs themselves contain references, you've just pushed the problem off one level, you haven't solved it.
- If the structs are larger than reference size -- and of course they almost always are -- then now you are copying them by copying the entire struct rather than copying a reference, and you've traded a GC time problem for a copy time problem. That might be a win, or a loss; use science to find out which it is.
When we were faced with this problem in Roslyn, we thought about it very carefully and did a lot of experiments. The strategy we went with was in general not to move things onto the stack. Rather, we identified how many small, short-lived objects there were active in memory at any one time, of each type -- using a profiler -- and then implemented a pooling strategy on those objects. You need a small object, you take it out of the pool. When you're done, you put it back in the pool. What happens is, you end up with O(number of objects active at any one time) in the pool, which quickly gets moved into the gen 2 heap; you then greatly lower your collection pressure on the gen 0 heap while increasing the cost of comparatively rare gen 2 collections.
I'm not saying that's the best choice for you. I'm saying that we had this same problem in Roslyn, and we solved it with science. You can do the same.