# Fast way to find duplicate in large array

• A+
Category：Languages

I have an array with 35k elements. How may I efficiently find duplicates and return those duplicates?

``all = [*1..35000, 1] ``

This solution works:

``all.select { |v| all.count(v) > 1 } ``

But it takes a long time.

Your code is taking an eon to execute because it is executing `count` for each element, resulting in it having a computational complexity of O(n2).

``arr = [*1..35000, 1, 34999] ``

If you want to know which values appear in the array at least twice...

``require 'set'  uniq_set = Set.new arr.each_with_object(Set.new) { |x,dup_set| uniq_set.add?(x) || dup_set.add(x) }.to_a   #=> [1, 34999] ``

Set lookups (implemented with a hash under the covers) are extremely fast.

If you want to know the numbers of times values appear in the array that appear at least twice...

``arr.each_with_object(Hash.new(0)) { |x,h| h[x] += 1 }.select { |_,v| v > 1 }   #=> {1=>2, 34999=>2} ``

This uses a counting hash. See Hash::new when it takes a default value as an argument.

If you want to know the indices of values that appear in the array at least twice...

``arr.each_with_index.     with_object({}) { |(x,i),h| (h[x] ||= []) << i }.     select { |_,v| v.size > 1 }   #=> {1=>[0, 35000], 34999=>[34998, 35001]} ``

When the hash `h` does not already have a key `x`,

``(h[x] ||= []) << i    #=> (h[x] = h[x] || []) << i    #=> (h[x] = nil || []) << i    #=> (h[x] = []) << i    #=> [] << i where [] is now h[x] ``