Efficient way to assign values from another column pandas df

  • A+
Category:Languages

I'm trying to create a more efficient script that creates a new column based off values in another column. The script below performs this but I can only select one string at a time. I'd like to do this on all individual values.

For the df below I'm currently running the script on each individual string in Location. However, I want to run the script on all unique strings.

Description on how the new column is assigned: Each individual string in Location gets a value for the first 3 unique items in Day. So, for each value in Location, a new string gets assigned to the first 3 unique values in Day.

import pandas as pd import numpy as np  d = ({     'Day' : ['Mon','Tues','Wed','Wed','Thurs','Thurs','Fri','Mon','Sat','Fri','Sun'],                      'Location' : ['Home','Home','Away','Home','Away','Home','Home','Home','Home','Away','Home'],             })  df = pd.DataFrame(data=d)  #Select value mask = df['Location'] == 'Home' df1 = df[mask].drop_duplicates('Day') d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))  df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d) 

At the moment I'm selecting each value in ['Location'], e.g. mask = df['Location'] == 'Home'.

I want to do it on all values. e.g. mask = df['Location'] == All unique values

Intended Output:

      Day Location Assign 0     Mon     Home     C1 1    Tues     Home     C1 2     Wed     Away     C2 3     Wed     Home     C1 4   Thurs     Away     C2 5   Thurs     Home     C3 6     Fri     Home     C3 7     Mon     Home     C1 8     Sat     Home     C3 9     Fri     Away     C2 10    Sun     Home     C4 

 


You can use:

def f(x):     #get unique days     u = x['Day'].unique()     #mapping dictionary     d = dict(zip(u, np.arange(len(u)) // 3 + 1))     x['new'] = x['Day'].map(d)     return x  df = df.groupby('Location', sort=False).apply(f) #add Location column s = df['new'].astype(str) + df['Location'] #encoding by factorize df['new'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('C') print (df)       Day Location new 0     Mon     Home  C1 1    Tues     Home  C1 2     Wed     Away  C2 3     Wed     Home  C1 4   Thurs     Away  C2 5   Thurs     Home  C3 6     Fri     Home  C3 7     Mon     Home  C1 8     Sat     Home  C3 9     Fri     Away  C2 10    Sun     Home  C4 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: