Find most common string in a 2D array

  • A+
Category:Languages

I have a 2D list:

arr = [['Mohit', 'shini','Manoj','Mot'],       ['Mohit', 'shini','Manoj'],       ['Mohit', 'Vis', 'Nusrath']] 

I want to find the most frequent element in the 2D list. In the above example, the most common string is 'Mohit'.

I know I can use brute force using two for loops and a dictionary to do this, but is there a more efficient way using numpy or any other library?

The nested lists could be of different lengths

Can someone also add the time of their methods? To find the fasted method. Also the caveats at which it might not be very efficient.

Edit

These are the timings of different methods on my system:

#timegb %%timeit collections.Counter(chain.from_iterable(arr)).most_common(1)[0][0] 5.91 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  #Kevin Fang and Curious Mind %%timeit flat_list = [item for sublist in arr for item in sublist] collections.Counter(flat_list).most_common(1)[0] 6.42 µs ± 501 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  %%timeit c = collections.Counter(item for sublist in arr for item in sublist).most_common(1)c[0][0] 6.79 µs ± 449 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  #Mayank Porwal def most_common(lst):     return max(set(lst), key=lst.count) %%timeit ls = list(chain.from_iterable(arr)) most_common(ls) 2.33 µs ± 42.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  #U9-Forward %%timeit l=[x for i in arr for x in i] max(l,key=l.count) 2.6 µs ± 68.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 

Mayank Porwal's method runs the fastest on my system.

 


I'd suggest flatten out the 2D Array and then use a counter to find out the most frequent element.

flat_list = [item for sublist in arr for item in sublist] from collections import Counter Counter(flat_list).most_common(1)[0] # ('Mohit', 3) Counter(flat_list).most_common(1)[0][0] # 'Mohit' 

Not sure if it is the fastest approach though.

Edit:

@timgeb's answer has a faster way to flatten the list using itertools.chain

A more space efficient way suggested by @schwobaseggl:

from collections import Counter c = Counter(item for sublist in arr for item in sublist).most_common(1) # [('Mohit', 3)] c[0][0] # 'Mohit' 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: