Find most common string in a 2D list

  • A+
Category:Languages

I have a 2D list:

arr = [['Mohit', 'shini','Manoj','Mot'],       ['Mohit', 'shini','Manoj'],       ['Mohit', 'Vis', 'Nusrath']] 

I want to find the most frequent element in the 2D list. In the above example, the most common string is 'Mohit'.

I know I can use brute force using two for loops and a dictionary to do this, but is there a more efficient way using numpy or any other library?

The nested lists could be of different lengths

Can someone also add the time of their methods? To find the fasted method. Also the caveats at which it might not be very efficient.

Edit

These are the timings of different methods on my system:

#timegb %%timeit collections.Counter(chain.from_iterable(arr)).most_common(1)[0][0] 5.91 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  #Kevin Fang and Curious Mind %%timeit flat_list = [item for sublist in arr for item in sublist] collections.Counter(flat_list).most_common(1)[0] 6.42 µs ± 501 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  %%timeit c = collections.Counter(item for sublist in arr for item in sublist).most_common(1)c[0][0] 6.79 µs ± 449 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  #Mayank Porwal def most_common(lst):     return max(set(lst), key=lst.count) %%timeit ls = list(chain.from_iterable(arr)) most_common(ls) 2.33 µs ± 42.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  #U9-Forward %%timeit l=[x for i in arr for x in i] max(l,key=l.count) 2.6 µs ± 68.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 

Mayank Porwal's method runs the fastest on my system.

 


  1. Flatten the list with itertools.chain.from_iterable
  2. Apply a Counter.

Demo:

>>> from itertools import chain >>> from collections import Counter >>>  >>> lst = [['Mohit', 'shini','Manoj','Mot'], ...:      ['Mohit', 'shini','Manoj'], ...:      ['Mohit', 'Vis', 'Nusrath']] ...:       >>> Counter(chain.from_iterable(lst)).most_common(1)[0][0] 'Mohit' 

Details:

>>> list(chain.from_iterable(lst)) ['Mohit',  'shini',  'Manoj',  'Mot',  'Mohit',  'shini',  'Manoj',  'Mohit',  'Vis',  'Nusrath'] >>> Counter(chain.from_iterable(lst)) Counter({'Manoj': 2, 'Mohit': 3, 'Mot': 1, 'Nusrath': 1, 'Vis': 1, 'shini': 2}) >>> Counter(chain.from_iterable(lst)).most_common(1) [('Mohit', 3)] 

Some timings:

>>> lst = lst*100 >>> %timeit Counter(chain.from_iterable(lst)).most_common(1)[0][0] # timgeb 53.7 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) >>> %timeit max([x for i in lst for x in i], key=l.count) # U9-Forward 207 µs ± 389 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) >>> %timeit Counter([x for sublist in lst for x in sublist]).most_common(1)[0][0] # Curious_Mind/Kevin Fang #1 75.2 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) >>> %timeit Counter(item for sublist in lst for item in sublist).most_common(1)[0][0] # Kevin Fang #2 95.2 µs ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) >>> %timeit flat = list(chain.from_iterable(lst)); max(set(flat), key=flat.count) # Mayank Porwal 98.4 µs ± 178 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 

(Note that Kevin Fang's second solution is a bit slower than the first one, but more memory efficient.)

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: