Why does Enumerable.Single() iterate all elements, even when more than one item has already been found?

  • A+
Category:Languages

When profiling one of our applications, we discovered a mysterious slowdown in some code where we were calling Enumerable.Single(source, predicate) for a large collection that had more than one item that matched the predicate near the start of the collection.

Investigation revealed that the implementation of Enumerable.Single() is as follows:

public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)  {         TSource result = default(TSource);         long count = 0;         // Note how this always iterates through ALL the elements:         foreach (TSource element in source) {              if (predicate(element)) {                 result = element;                 checked { count++; }             }         }         switch (count) {             case 0: throw Error.NoMatch();             case 1: return result;         }         throw Error.MoreThanOneMatch();     } 

That implementation will iterate through every element of the sequence, even if more than one element has already matched the predicate.

The following implementation would appear to yield the same results:

public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {     TSource result = default(TSource);     long count = 0;     foreach (TSource element in source) {         if (predicate(element)) {             if (count == 1) // Exit loop immediately if more than one match found.                 throw Error.MoreThanOneMatch();              result = element;             count++; // "checked" is no longer needed.         }     }      if (count == 0)         throw Error.NoMatch();      return result; } 

Does anyone know why the actual implementation doesn't use this obvious optimization? Is there something I'm missing? (I can't imagine that such an obvious optimization would be overlooked, and therefore there must be some concrete reason for it.)

(Note: I realize that this question may attract answers that are opinions; I'm hoping for answers that provide concrete reasons for iterating all elements. If the answer is actually "because the designers didn't think such an optimization was necessary", then this question is unanswerable and I guess I should just delete it...)


For comparison, look at the implementation of Single() which does not take a predicate:

public static TSource Single<TSource>(this IEnumerable<TSource> source)  {     IList<TSource> list = source as IList<TSource>;     if (list != null) {         switch (list.Count) {             case 0: throw Error.NoElements();             case 1: return list[0];         }     }     else {         using (IEnumerator<TSource> e = source.GetEnumerator()) {             if (!e.MoveNext()) throw Error.NoElements();             TSource result = e.Current;             if (!e.MoveNext()) return result;         }     }     throw Error.MoreThanOneElement(); } 

In this case, they've gone to the effort of adding an optimisation for IList.

 


You didn't seem to be the only one thinking that. The .NET Core implementation has an optimized version:

using (IEnumerator<TSource> e = source.GetEnumerator()) {     while (e.MoveNext())     {         TSource result = e.Current;         if (predicate(result))         {             while (e.MoveNext())             {                 if (predicate(e.Current))                 {                     throw Error.MoreThanOneMatch();                 }             }              return result;         }     } } 

So to answer your question: there doesn't seem to be a 'good' reason, other than just a developer not thinking about optimizing this use case.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: