Count number of words per row

  • A+

I'm trying to create a new column in a dataframe that contains the word count for the respective row. I'm looking to the total number of words, not frequencies of each distinct word. I assumed there would be a simple/quick way to do this common task, but after googling around and reading a handful of SO posts (1, 2, 3, 4) I'm stuck. I've tried the solutions put forward in the linked SO posts, but get lots of attribute errors back.

words = df['col'].split() df['totalwords'] = len(words) 

results in

AttributeError: 'Series' object has no attribute 'split' 


f = lambda x: len(x["col"].split()) -1 df['totalwords'] = df.apply(f, axis=1) 

results in

AttributeError: ("'list' object has no attribute 'split'", 'occurred at index 0') 

Option 1
str.split + str.len
str.len works nicely for any non-numeric column.

df['totalwords'] = df['col'].str.split().str.len() 

Option 2
If your words are single-space separated, you may simply count the spaces plus 1.

df['totalwords'] = df['col'].str.count(' ') + 1 

Option 3
List comprehension
This is faster than you think!

df['totalwords'] = [len(x.split()) for x in df['col'].tolist()] 


:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: