Replace values in pandas column with default value for missing keys

  • A+
Category:Languages

I have multiple simple functions that need to be implemented on every row of certain columns of my dataframe. The dataframe is very like, 10 million+ rows. My dataframe is something like this:

Date      location   city        number  value 12/3/2018   NY       New York      2      500 12/1/2018   MN       Minneapolis   3      600 12/2/2018   NY       Rochester     1      800 12/3/2018   WA       Seattle       2      400 

I have functions like these:

def normalized_location(row):     if row['city'] == " Minneapolis":         return "FCM"     elif row['city'] == "Seattle":         return "FCS"     else:         return "Other" 

and then I use:

df['Normalized Location'] =df.apply (lambda row: normalized_location (row),axis=1) 

This is extremely slow, how can I make this more efficient?

 


We can make this BLAZING fast using map with a defaultdict.

from collections import defaultdict  d = defaultdict(lambda: 'Other') d.update({"Minneapolis": "FCM", "Seattle": "FCS"})  df['normalized_location'] = df['city'].map(d)  print(df)         Date location         city  number  value normalized_location 0  12/3/2018       NY     New York       2    500               Other 1  12/1/2018       MN  Minneapolis       3    600                 FCM 2  12/2/2018       NY    Rochester       1    800               Other 3  12/3/2018       WA      Seattle       2    400                 FCS 

...to circumvent a fillna call, for performance reasons. This approach generalises to multiple replacements quite easily.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: