Why are 1000 threads faster than a few?

  • A+

I have a simple program that searches linearly in an array of 2D points. I do 1000 searches into an array of 1 000 000 points.

The curious thing is that if I spawn 1000 threads, the program works as fast as when I span only as much as CPU cores I have, or when I use Parallel.For. This is contrary to everything I know about creating threads. Creating and destroying threads is expensive, but obviously not in this case.

Can someone explain why?

Note: this is a methodological example; the search algorithm is deliberately not meant do to optimal. The focus is on threading.

Note 2: I tested on an 4-core i7 and 3-core AMD, the results follow the same pattern!

using System; using System.Collections.Generic; using System.Diagnostics; using System.Threading;  /// <summary> /// We search for closest points. /// For every point in array searchData, we search into inputData for the closest point,  /// and store it at the same position into array resultData; /// </summary> class Program {     class Point     {         public double X { get; set; }         public double Y { get; set; }          public double GetDistanceFrom (Point p)         {             double dx, dy;             dx = p.X - X;             dy = p.Y - Y;             return Math.Sqrt(dx * dx + dy * dy);         }     }      const int inputDataSize = 1_000_000;     static Point[] inputData = new Point[inputDataSize];      const int searchDataSize = 1000;     static Point[] searchData = new Point[searchDataSize];     static Point[] resultData = new Point[searchDataSize];      static void GenerateRandomData (Point[] array)     {         Random rand = new Random();         for (int i = 0; i < array.Length; i++)         {             array[i] = new Point()             {                 X = rand.NextDouble() * 100_000,                 Y = rand.NextDouble() * 100_000             };         }     }      private static void SearchOne(int i)     {         var searchPoint = searchData[i];         foreach (var p in inputData)         {             if (resultData[i] == null)             {                 resultData[i] = p;             }             else             {                 double oldDistance = searchPoint.GetDistanceFrom(resultData[i]);                 double newDistance = searchPoint.GetDistanceFrom(p);                 if (newDistance < oldDistance)                 {                     resultData[i] = p;                 }             }         }     }      static void AllThreadSearch()     {         List<Thread> threads = new List<Thread>();         for (int i = 0; i < searchDataSize; i++)         {             var thread = new Thread(                 obj =>                 {                     int index = (int)obj;                     SearchOne(index);                 });             thread.Start(i);             threads.Add(thread);         }         foreach (var t in threads) t.Join();     }      static void FewThreadSearch()     {         int threadCount = Environment.ProcessorCount;         int workSize = searchDataSize / threadCount;         List<Thread> threads = new List<Thread>();         for (int i = 0; i < threadCount; i++)         {             var thread = new Thread(                 obj =>                 {                     int[] range = (int[])obj;                     int from = range[0];                     int to = range[1];                     for (int index = from; index < to; index++)                     {                         SearchOne(index);                     }                 }                 );             int rangeFrom = workSize * i;             int rangeTo = workSize * (i + 1);             thread.Start(new int[]{ rangeFrom, rangeTo });             threads.Add(thread);         }         foreach (var t in threads) t.Join();     }      static void ParallelThreadSearch()     {         System.Threading.Tasks.Parallel.For (0, searchDataSize,                  index =>                 {                     SearchOne(index);                 });     }      static void Main(string[] args)     {         Console.Write("Generatic data...  ");         GenerateRandomData(inputData);         GenerateRandomData(searchData);         Console.WriteLine("Done.");         Console.WriteLine();          Stopwatch watch = new Stopwatch();          Console.Write("All thread searching... ");         watch.Restart();         AllThreadSearch();         watch.Stop();         Console.WriteLine($"Done in {watch.ElapsedMilliseconds} ms.");          Console.Write("Few thread searching... ");         watch.Restart();         FewThreadSearch();         watch.Stop();         Console.WriteLine($"Done in {watch.ElapsedMilliseconds} ms.");          Console.Write("Parallel thread searching... ");         watch.Restart();         ParallelThreadSearch();         watch.Stop();         Console.WriteLine($"Done in {watch.ElapsedMilliseconds} ms.");          Console.WriteLine();         Console.WriteLine("Press ENTER to quit.");         Console.ReadLine();     } } 

EDIT: Please make sure to run the app outside the debugger. VS Debugger slows down the case of multiple threads.


You may want to consider how the application is accessing memory. In the maximum threads scenario you are effectively accessing memory sequentially, which is efficient from a caching point of view. The approach using a small number of threads is more random, causing cache misses. Depending on the CPU there are performance counters that allow you to measure L1 and L2 cache hits/misses.


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: