Does “where” position in LINQ query matter when joining in-memory?

  • A+
Category:Languages

Situation: Say we are executing a LINQ query that joins two in-memory lists (so no DbSets or SQL-query generation involved) and this query also has a where clause. This where only filters on properties included in the original set (the from part of the query).

Question: Does the linq query interpreter optimize this query in that it first executes the where before it performs the join, regardless of whether I write the where before or after the join? – so it does not have to perform a join on elements that are not included later anyways.

Example: For example, I have a categories list I want to join with a products list. However, I am just interested in the category with ID 1. Does the linq interpreter internally perform the exact same operations regardless of whether I write:

from category in categories join prod in products on category.ID equals prod.CategoryID where category.ID == 1 // <------ below join select new { Category = category.Name, Product = prod.Name }; 

or

from category in categories where category.ID == 1 // <------ above join join prod in products on category.ID equals prod.CategoryID select new { Category = category.Name, Product = prod.Name }; 

Previous research: I already saw this question but the OP author stated that his/her question is only targeting non-in-memory cases with generated SQL. I am explicitly interested with LINQ executing a join on two lists in-memory.

Update: This is not a dublicate of "Order execution of chain linq query" question as the referenced question clearly refers to a dbset and my question explicitly addressed a non-db scenario. (Moreover, although similar, I am not asking about inclusions based on navigational properties here but about "joins".)

Update2: Although very similar, this is also not a dublicate of "Is order of the predicate important when using LINQ?" as I am asking explicitly about in-memory situations and I cannot see the referenced question explicitly addressing this case. Moreover, the question is a bit old and I am actually interested in linq in the context of .NET Core (which didn't exist in 2012), so I updated the tag of this question to reflect this second point.

Please note: With this question I am aiming at whether the linq query interpreter somehow optimizes this query in the background and am hoping to get a reference to a piece of documentation or source code that shows how this is done by linq. I am not interested in answers such as "it does not matter because the performance of both queries is roughly the same".

 


The LINQ query syntax will be compiled to a method chain. For details, read e.g. in this question.

The first LINQ query will be compiled to the following method chain:

categories     .Join(         products,         category => category.ID,         prod => prod.CategoryID,         (category, prod) => new { category, prod })     .Where(t => t.category.ID == 1)     .Select(t => new { Category = t.category.Name, Product = t.prod.Name }); 

The second one:

categories     .Where(category => category.ID == 1)     .Join(         products,         category => category.ID,         prod => prod.CategoryID,         (category, prod) => new { Category = category.Name, Product = prod.Name }); 

As you can see, the second query will cause less allocations (note only one anonymous type vs 2 in the first query, and note how many instances of those anonymous types will be created on performing the query).

Furthermore, it's clear that the first query will perform a join operation on lot more data than the second (already filtered) one.

There will be no additional query optimization in case of LINQ-to-objects queries.

So the second version is preferable.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: