O(NlogN) algorithm runs faster than O(n)… wait, what?

  • A+

I am a bit confused, to be honest. I was working out the one of the classical algorithm problems. Given a collection of integers, find if are there 2 elements summing to a given number.

So I have implemented 2 solutions.

bool find1(std::vector<int>& V, int sum)  {     std::unordered_set<int> hashTable;     for (int i = 0; i < V.size(); ++i)      {         if (hashTable.find(V[i]) != hashTable.end())          {             return true;         }         hashTable.insert(sum - V[i]);     }     return false; }  bool find2(std::vector<int>& V, int sum)  {     for (int i = 0; i < V.size() ; ++i)      {         if (std::binary_search(V.begin(), V.end(), sum - V[i]))          {             return true;         }     }     return false; } 

Find1 is expected to be a linear algorithm (depending on the load of buckets and efficiency of hashing function).

Find2 is expected to be NlogN, we loop and do a binary search for every iteration.

After implementing this function I tried to test the running times of these algos on a relatively big collection, and the result confused me..

int main()  {     std::vector<int> V(10000,0);      std::chrono::system_clock::time_point now1 = std::chrono::system_clock::now();      for (int i = 0; i < 100; ++i)      {         bool b = find1(V, 1000);     }      std::chrono::system_clock::time_point then1 = std::chrono::system_clock::now();     std::cout <<"Linear with hashing = "<< std::chrono::duration_cast<std::chrono::microseconds>(then1 - now1).count()<<std::endl;      std::chrono::system_clock::time_point now2 = std::chrono::system_clock::now();     std::sort(V.begin(), V.end());     for (int i = 0; i < 100; ++i)     {         bool b = find2(V, 1000);     }      std::chrono::system_clock::time_point then2 = std::chrono::system_clock::now();     std::cout <<"NlogN with binary_search = " <<std::chrono::duration_cast<std::chrono::microseconds>(then2 - now2).count() << std::endl;      system("pause"); } 

Here I initialize the vector with 0s to be sure that both algos run the worst case.
The output of the program is:

Linear with hashing = 6759245          NlogN with binary_search = 4508025 

How this is possible? Can anyone please explain this to me?


Just because the upper bound of the asymptotical complexity of one algorithm is less than another, doesn't mean that it's faster for any arbitrary input. It just means that there exists a certain size of input N', beyond which the less complex algorithm will be faster. This size will be specific to each particular system running the program.

Measuring the asymptotically more complex algorithm to be faster simply means that your test was below the size N'. However, that assumes that your complexity analysis applies to the program in the first place. For example, analyzing the worst case complexity is wrong if your program tests the algorithm with best-case input and vice versa.

For what it's worth, the results of on my system are:

Linear with hashing = 9557 NlogN with binary_search = 15828 


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: