count total number of list elements in pandas column

  • A+
Category:Languages

I have a pandas dataframe A with column keywords as (here Im showing only 4 rows but in actual there are millions) :-

 keywords  ['loans','mercedez','bugatti']  ['trump','usa']  ['galaxy','7s','canon','macbook']  ['beiber','spiderman','marvels','ironmen'] 

I want to sum total number of list elements in column keywords and store it into some variable. Something like

total_sum=elements in keywords[0]+elements in keywords[1]+elements in            keywords[2]+elements in keywords[3]  total_sum=3+2+4+4 total_sum=13 

How I can do it in pandas?

 


Using sum and map:

sum(map(len, df.keywords)) 

Sample

df = pd.DataFrame({     'keywords': [['a', 'b', 'c'], ['c', 'd'], ['a', 'b', 'c', 'd'], ['g', 'h', 'i']] })  sum(map(len, df.keywords)) 

12 

Timings

df = pd.concat([df]*10000)  %timeit sum(map(len, df.keywords)) 1.87 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  %timeit df.keywords.map(len).sum() 13.5 ms ± 661 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  %timeit df.keywords.str.len().sum() 14.3 ms ± 272 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

Validation

>>> sum(map(len, df.keywords)) == df.keywords.map(len).sum() == df.keywords.str.len().sum() True 

A bit of a disclaimer: using pandas methods on columns that contain lists is always going to be inefficient (which is why using non-pandas' methods is so much faster here), since DataFrames are not meant to store list. You should try to avoid this whenever possible.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: